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© A method for recognizing an object image comprises the steps of extracting a candidate for a predetermined 
object image from an image, and making a judgment as to whether the extracted candidate for the predeter- 
mined object image is or is not the predetermined object image. The candidate for the predetermined object 
image is extracted by causing the center point of a view window, which has a predetermined size, to travel to 
the position of the candidate for the predetermined object image, and determining an extraction area in 
accordance with the size and/or the shape of the candidate for the predetermined object image, the center point 
of the view window being taken as a reference during the determination of the extraction area. A learning method 
for a neural network comprises the steps of extracting a target object image, for which learning operations are to 
be carried out, from an image, feeding a signal, which represents the extracted target object image, into a neural 
network, and carrying out the learning operations of the neural network in accordance with the input target object 
image. 
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BACKGROUND OF THE INVENTION 
Field of the Invention 

5 This invention relates to a method for recognizing the presence or absence of a predetermined object 

image in an image. This invention particularly relates to a method for recognizing an object image wherein, 
during image information processing, a judgment is made as to whether a candidate for a predetermined 
object image, which candidate has been extracted from an image, is or is not the predetermined object 
image. This invention also relates to a learning method for a neural network, wherein a target object image, 

w for which the learning operations are to be carried out. is extracted from an image, and the learning 
operations of a neural network for carrying out recognition of a predetermined object image are carried out 
with respect to the extracted target object image. 

Description of the Prior Art 

75 

A human being views an image and recognizes what the thing embedded in the image is. It is known 
that this action can be divided into two steps. A first step is to carry out "discovery and extraction" by 
moving the viewpoint, setting a target of recognition at the center point of the visual field, and at the same 
time finding the size of the object. A second step is to make a judgment from a memory and a knowledge 

20 of the human being as to what the object present at the viewpoint is. Ordinarily, human beings iterate the 
two steps and thereby acquire information about the outer world. 

On the other hand, in conventional techniques for recognizing a pattern by carrying out image 
processing, typically in pattern matching techniques, importance is attached only to the second step. 
Therefore, various limitations are imposed on the first step for "discovery and extraction." For example, it is 

25 necessary for a human being to intervene in order to cut out a target and normalize the size of the target. 
Also, as in the cases of automatic reading machines for postal code numbers, it is necessary for a target 
object to be placed at a predetermined position. As pattern recognizing techniques unaffected by a change 
in size and position of a target, various techniques have been proposed wherein a judgment is made from 
an invariable quantity. For example, a method utilizing a central moment, a method utilizing a Fourier 

30 description element, and a method utilizing a mean square error have been proposed. With such methods, 
for the purposes of recognition, it is necessary to carry out complicated integrating operations or coordinate 
transformation. Therefore, extremely large amounts of calculations are necessary in cases where it is 
unknown where a target object is located or in cases where a large image is processed. Also, with these 
methods, in cases where a plurality of object images are embedded in an image, there is the. risk that their 

35 coexistence causes a noise to occur and causes errors to occur in recognizing the object images. Thus 
these methods are not satisfactory in practice. 

As a model, which is unaffected by the size of a target object or by a shift in position of a target object 
and which can accurately recognize the target object, a model utilizing a neocognitron, which is one of 
techniques for neural networks, has been proposed. The neocognitron is described by Fukushima in 

40 "Neocognitron: A Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in 
Position," Collected Papers of The Institute of Electronics and Communication Engineers of Japan, A.J62-A- 
(10), pp. 658-665, Oct. 1979. Neural networks constitute one of research techniques for neural information 
processing, which is referred to as the constructive method and which aims at clarifying the information 
processing principle of a brain by constructing an appropriate neural circuitry model with full consideration 

45 given to the facts known physiologically and results of research, investigating the actions and performance 
of the model, and comparing the actions and performance of the model with those of the actual human 
brain. Research has been conducted to develop various models, such as visual models, learning models, 
and associative memory models. In particular, the neocognitron model is tolerant of a shift in position of an 
object image embedded in an image. The neocognitron carries out pattern matching and self-organizing 

so learning operations on a small part of a target object image, assimilates a shift in position at several stages 
with a layered architecture, and thereby tolerates the shift in position. 

In the neocognitron, the operation for tolerating a shift in position of a feature little by little at several 
stages plays an important role in eliminating adverse effects of a shift in position of an input pattern and 
carrying out pattern recognition tolerant of a deformation of the input pattern. Specifically, adverse effects of 

55 shifts in position between local features of an input pattern, which shifts are due to various deformations, 
such as enlargement and reduction, of the input pattern, are assimilated little by little during the process for 
putting the features together. Ultimately, an output can be obtained which is free of adverse effects of 
comparatively large deformation of the input pattern. 
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As described above, the neocognitron is based on the principle that the pattern matching is carried out 
on a small part of a target object, and a shift in its position is assimilated at several stages through a 
layered architecture. However, with such a principle, a limitation is naturally imposed on achievement of 
both the accurate recognition and the assimilation of the shift in position. It has been reported, for example, 
5 by Nagano in "Neural Net for Extracting Size Invariant Features," Computrol, No. 29, pp. 26-31, that the 
neocognitron can ordinarily tolerate only approximately four times of fluctuation in size. As for the shift in 
position, the neocognitron can tolerate only approximately two or three times the size of a target object. The 
tolerance capacity remains the same also in a recently proposed neocognitron model which is provided with 
a selective attention mechanism. 
10 How the visual function of a human being carries out the first step has not yet been clarified. On the 

other hand, how the viewpoint moves has been clarified to some extent as described, for example, by 
Okewatari in "Visual and Auditory Information Processing in Living Body System," Information Processing, 
Vol. 23, No. 5, pp. 451-459, 1982, or by Sotoyama in "Structure and Function of Visual System", 
Information Processing, Vol. 26, No. 2, pp. 108-116, 1985. It is known that eyeball movements include a 
15 saccadic movement, a follow-up movement, and involuntary movement. Several models that simulate these 
eye movements have been proposed. For example, a model in which the viewpoint is moved to the side of 
a larger differential value of an image is proposed, for example, by Nakano in "Pattern Recognition Learning 
System," Image Information (I), 1987/1, pp. 31-37, or by Shiratori, et al. in "Simulation of Saccadic 
Movement by Pseudo-Retina Mask," Television Engineering Report, ITEJ Tec. Rep. Vol. 14, No. 36, pp. 25- 
20 30, ICS' 90-54, ALPS 1 90-46, June 1990. Also, a model in which the viewpoint is moved to the side of a 
higher lightness is proposed, for example, by Hirahara, et al. in "Neural Net for Specifying a Viewpoint," 
Television Engineering Report, ITEJ Tec. Rep. Vol. 14, No. 33, pp. 25-30, VAr 90-28, June 1990. 
Additionally, a model in which the viewpoint is moved to a point of a contour having a large curvature is 
proposed, for example, by Inui, et al. in Japanese Unexamined Patent Publication No. 2(1 990)-1 38677. 
25 However, these proposed models are rather simple and do not well simulate the human visual function. 

Also, for the purposes of finding a target of recognition and extracting a region including the whole 
target, instead of adhering only to local features of the target object, it is necessary that the movement of 
the viewpoint becomes stable (stationary) at the center point of the whole target. However, with the 
aforesaid conventional models, such an operation for stabilizing the viewpoint cannot be carried out. For 
so example, with the model proposed by Shiratori, et al. wherein the pseudo-retina mask is utilized, the 
viewpoint moves forward and backward around the contour line of an object and does not become stable. 
Also, with the model proposed by Inui, et al., the viewpoint can ultimately catch only a feature point at a 
certain limited part of an object. Additionally, most of the aforesaid conventional models requires, as a tacit 
precondition, that the background of an object is simple. Thus most of the aforesaid conventional models 
35 cannot be applied to natural images, such as ordinary photographic images. 

As described above, various techniques have been proposed which enables satisfactory recognition of a 
target in cases where a human being intervenes in order to assimilate a shift in position of the target or a 
change in the size of the target or in cases where the position and the size of the target are normalized in 
advance. However, no excellent technique has yet been proposed, with which the whole target object image 
40 can be extracted from an image for the purposes of recognizing the object image. 

As described above, human beings extract an appropriate size of a target of recognition from an image 
of the outer world and thereafter efficiently carry out recognition processing. On the other hand, with the 
conventional methods, an attempt is made to recognize a target in an image of the outer world only with a 
single processing system. Therefore, problems occur in that very complicated procedures and a very long 
45 time are required. Also, problems occur in that it is necessary for a human being to intervene in the 
extraction of the target, or it is necessary for the background of the image to be simple. Accordingly, the 
conventional methods are not satisfactory in practice. These problems occur because no efficient method 
has heretofore been available for extracting a target object, which is to be recognized, from an image of the 
outer world, and the structure of the recognition system is such that a heavy burden is imposed on a 
so judgment means of the recognition system. 

Also, in cases where the technique is used which is unaffected by a shift in position and which 
accurately recognizes an object image, appropriate self-organizing learning operations must be carried out 
on the neural network, such as the neocognitron, and a neural network suitable for the recognition of the 
object image must thereby be built up. 
55 However, if substantially identical object images differ in size from one another or include an object 

image, for which the learning operations of the neural network need not be carried out, a disturbance will be 
caused in the classification into categories during the learning operations, i.e., during the creation of 
synaptic connections in the neural network. As a result, appropriate learning operations cannot be carried 
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out. Therefore, when the learning operations of the neural network, such as the neocognitron, are carried 
^ out, it is necessary for a human being to intervene in order to extract a target object image, for which the 
learning operations are to be carried out, to normalize the extracted target object image into an appropriate 
size, and to feed only the necessary information to the neural network. Considerable time and labor are 
5 required to carry out such intervening operations. 

SUMMARY OF THE INVENTION 

The primary object of the present invention is to provide a method for recognizing an object image 
w wherein, during pattern recognition, a candidate for a predetermined object image is extracted appro- 
priately, an appropriate judgment is made as to whether the extracted candidate for the predetermined 
object image is or is not the predetermined object image, and the time required for operations is kept short. 

Another object of the present invention is to provide a method for recognizing an object image, wherein 
a predetermined object image is extracted appropriately and accurately from a given image, 
/s A further object of the present invention is to provide a method for recognizing, an object image, 
wherein an accurate judgment is made as to whether a given candidate for a predetermined object image is 
or is not the predetermined object image. 

A still further object of the present invention is to provide a method for recognizing an object image 
wherein, even if the background of a candidate for a predetermined object image in an image is 
20 complicated, the candidate for the predetermined object image is extracted appropriately. 

Another object of the present invention is to provide a method for recognizing an object image, wherein 
the judgment performance of a system, which makes a judgment as to whether a candidate for a 
predetermined object image is or is not the predetermined object image, is kept high. 

A further object of the present invention is to provide a method for recognizing an object image, 
25 wherein judgments as to whether feature parts of a predetermined object image are or are not included in 
feature parts of a candidate for the predetermined object image are made appropriately regardless of a 
change in the angle of the object image and a difference among object images. 

A still further object of the present invention is to provide a learning method for a neural network, 
wherein a target object image, for which the learning operations of a neural network are to be carried out, is 
30 automatically normalized and extracted, and the learning operations of the neural network are carried out 
efficiently. 

Another object of the present invention is to provide a learning method for a neural network, wherein a 
target object image is extracted automatically from an image, the extracted target object image is classified 
in an arranged form, and learning operations are thereby carried out. 
35 A further object of the present invention is to provide a method for recognizing an object image and a 
learning method for a neural network, which enable the operation scale to be kept small. 

A still further object of the present invention is to provide a method for recognizing an object image and 
a learning method for a neural network, in which extraction and judgment processes are carried out 
simultaneously and which enables processing to be carried out very quickly with special hardware 
40 functions. 

Another object of the present invention is to provide a method for recognizing an object image and a 
learning method for a neural network, wherein a view window of an input device is caused to travel to an 
object image, which shows a movement different from the movement of the background of the object 
image. 

45 The present invention provides a method for recognizing an object image, which comprises the steps 

of: 

i) extracting a candidate for a predetermined object image from an image, and 

ii) making a judgment as to whether the extracted candidate for the predetermined object image is or is 
not the predetermined object image. 

so In a first method for the aforesaid extraction, the extraction of the candidate for the predetermined 
object image is carried out by: 

a) causing the center point of a view window, which has a predetermined size, to travel to the position of 
the candidate for the predetermined object image, and 

b) determining an extraction area in accordance with the size and/or the shape of the candidate for the 
55 predetermined object image, the center point of the view window being taken as a reference during the 

determination of the extraction area. 
In a second method for the aforesaid extraction, the extraction of the candidate for the predetermined 
object image is carried out by: 
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a) cutting out an image, which falls in the region inside c-' a view window having a predetermined size, 
from the image, 

b) finding azimuths and intensities of components, such as a color and contour lines, of the candidate for 
the predetermined object image with respect to the center point of the view window, the azimuths and 

5 the intensities being found as azimuth vectors from a movement of the whole cut-out image or of an 
entire complex-log mapped image, which is obtained from transformation of the cut-out image with 
complex-log mapping, the color of the candidate for the predetermined object image included in the cut- 
out image, and/or tilts of the contour lines of the candidate for the predetermined object image included 
in the cut-out image, 

w c) composing a vector from the azimuth vectors, a vector for the travel of the view window being thereby 
determined, 

d) causing the center point of the view window to travel in accordance with the vector for the travel of the 
view window, and 

e) determining an extraction area in accordance with the size and/or the shape of the candidate for the 
15 predetermined object image, the center point of the view window being taken as a reference during the 

determination of the extraction area. 
In a third method for the aforesaid extraction, the extraction of the candidate for the predetermined 
object image is carried out by: 

a) taking the vectors for the travel of the view window, which are determined with the aforesaid second 
20 method for the extraction, as gradient vectors of a potential field, recording the gradient vectors of the 

potential field on the whole image, and thereby creating a map of the potential field, and 

b) determining an extraction area in accordance with the size and/or the shape of the candidate for the 
predetermined object image, a minimum point of the potential in the map being taken as a reference 
during the determination of the extraction area. 

25 What the term "potential field" as used herein means will be described hereinbelow. When a human 
being views an image, he will look around the image and will move his viewpoint to a predetermined object 
image embedded in the image (e.g., to a face image in cases where an image constituted of a human face 
image and a background representing the sky is viewed). Thereafter, he will recognize that the thing present 
at the viewpoint is the face image. When the viewpoint is currently located at a position spaced apart from 

30 the predetermined object image, it is necessary for the viewpoint to be moved a long distance towards the 
predetermined object image on the image. When the viewpoint is currently located at a position near the 
predetermined object image, the viewpoint need to travel only a short distance in order to reach the 
predetermined object image. At the position of the predetermined object image, the viewpoint becomes 
stable. Specifically, if the direction and the amount in which the viewpoint is to travel is expressed as a 

35 vector for the travel of the viewpoint, the vector for the travel of the viewpoint will represent the direction of 
the viewpoint stabilizing point and the amount of travel thereto, which are taken from the current position of 
the viewpoint. At the viewpoint stabilizing point, i.e., at the center point of the predetermined object image, 
the vector for the travel of the viewpoint is zero. If it is considered that the image has a "field" of stability of 
the viewpoint, the "field" is flat at the viewpoint stabilizing point and is gradient at a point, at which the 

40 viewpoint is unstable and from which the viewpoint is required to travel in order to become stable. As 
described above, it can be regarded that the vector for the travel of the viewpoint represents the gradient of 
the "field." Also, it can be regarded that the travel of the viewpoint is equivalent to the movement to the 
side of a lower potential in the "field." The "field" of stability of the viewpoint is herein referred to as the 
"potential field." 

45 As described above, the map of the potential field over the whole image is created from gradient 
vectors of the potential field, which are calculated at respective positions of the whole image. Specifically, 
the vectors for the travel of the human viewpoint are taken as the gradient vectors of the potential field, and 
it is regarded that the potential field is inclined to the direction, to which each gradient vector of the 
potential field is directed. The gradients of the field are recorded on the whole image such that the gradient 

so vector of the potential field may be zero, i.e., the potential field may be minimum, at the center point of the 
candidate for the predetermined object image. From the map created in this manner, it can be understood 
easily which path the viewpoint at a current position on the image will follow in order to fall into the 
minimum point of the potential field. The extraction area is determined in accordance with the size and/or 
the shape of the candidate for the predetermined object image by taking the minimum point of the potential 

55 as a reference. 

When the vector for the travel of the view window is composed from the azimuth vectors, if necessary, 
phase shifts or weights may be applied to the azimuth vectors. Also, a neural network may be employed in 
order to determine the vector for the travel of the view window or the gradient vector of the potential field 
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and to extract the candidate for the predetermined object image. 
. The judgment as to whether the extracted candidate for the predetermined object image is or is not the 

predetermined object image may be made by: 

a) causing a learning means to learn a plurality of feature patterns with respect to each of a plurality of 
's feature parts of the predetermined object image, 

b) making judgments as to whether feature parts of the candidate for the predetermined object image are 
or are not included in the plurality of the feature patterns with respect to each of the plurality of the 
feature parts of the predetermined object image, which feature patterns the learning means has learned, 
and 

70 c) making a judgment as to whether the relationship between the positions of the feature parts of the 
candidate for the predetermined object image coincides or does not coincide with the relationship 
between the positions of the feature parts of the predetermined object image, thereby judging whether 
the candidate for the predetermined object image is or is not the predetermined object image. 
In the method for recognizing an object image in accordance with the present invention, the extraction 
;s of the candidate for the predetermined object image and the judgment as to whether the extracted 
candidate for the predetermined object image is or is not the predetermined object image may be 
embodied as defined in Claims 2 through 35 and Claim 65, and reference should herein be made thereto. 
The present invention also provides a learning method for a neural network, which comprises the steps 

of: 

20 i) extracting a target object image, for which learning operations are to be carried out, from an image, 

ii) feeding a signal, which represents the extracted target object image, into a neural network, and 

iii) carrying out the learning operations of the neural network in accordance with the input target object 
image. 

In order to extract the target object image, the same methods as those for the extraction of the 
25 candidate for the predetermined object image in the method for recognizing an object image in accordance 
with the present invention may be employed. 

As in the aforesaid method for recognizing an object image in accordance with the present invention, 
the extraction of the target object image in the learning method for a neural network in accordance with the 
present invention may be embodied as defined in Claims 36 through 64, and reference should herein be 
30 made thereto. 

With the method for recognizing an object image in accordance with the present invention, the 
candidate for the predetermined object image is extracted from an image, and thereafter a judgment is 
made as to whether the extracted candidate for the predetermined object image is or is not the 
predetermined object image. Therefore, a judgment as to whether the extracted candidate for the 

35 predetermined object image is or is not the predetermined object image can be made accurately and easily 
without heavy burden being imposed on a judgment means of a system, in which the method for 
recognizing an object image in accordance with the present invention is carried out. Also, the extraction 
process and the judgment process can be carried out simultaneously with each other, and very quick 
operations can be achieved with special hardware functions for carrying out the recognition of the object 

40 image. Additionally, processing need be carried out only for a limited part of the image, in which the 
candidate for the predetermined object image is embedded, and therefore the operation time can be kept 
short. 

The extraction of the candidate for the predetermined object image may be carried out by causing the 
center point of the view window, which has a predetermined size, to travel to the position of the candidate 

45 for the predetermined object image, and determining the extraction area in accordance with the size and/or 
the shape of the candidate for the predetermined object image. During the determination of the extraction 
area, the center point of the view window is taken as a reference. 

Alternatively, the extraction of the candidate for the predetermined object image may be carried out by 
cutting out an image, which falls in the region inside of the view window having a predetermined size, from 

so the image, and detecting a contour line of the candidate for the predetermined object image from the cut- 
out image. Thereafter, contour line components, which are tilted at a predetermined angle with respect to 
contours of a group of concentric circles surrounding the center point of the view window, are extracted 
from the contour line of the candidate for the predetermined object image. Azimuth vectors are detected 
from these contour line components. A vector is then composed from the azimuth vectors, and a vector for 

55 the travel of the view window is thereby determined. In this manner, the direction, to which the center point 
of the view window should travel, is determined. The extraction area is then determined in accordance with 
the size and/or the shape of the candidate for the predetermined object image, the center point of the view 
window being taken as a reference during the determination of the extraction area. In cases where the cut- 
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20 



out image is transformed with the complex-log mapping, the candidate for the predetermined object image 
can be extracted in the same manner as that when the extraction of the candidate for the predetermined 
object image is carried out in the Cartesian plane. 

As another alternative, the extraction of the candidate for the predetermined object image may be 

5 carried out by cutting out an image, which falls in the region inside of the view window having a 
predetermined size, from the image, and detecting a region, which approximately coincides in color with the 
candidate for the predetermined object image, from the cut-out image. A vector for the travel of the view 
window is then determined from the azimuth and the distance of the detected region. In this manner, the 
direction, to which the center point of the view window should travel, is determined. The extraction area is 

w then determined in accordance with the size and/or the shape of the candidate for the predetermined object 
image, the center point of the view window being taken as a reference during the determination of the 
extraction area. 

The term "approximately coinciding in color with a candidate for a predetermined object image" as 
used herein means that the distance on a chromaticity diagram shown in Figure 67 between a chromaticity 
value of the candidate for the predetermined object image and a chromaticity value at an arbitrary point of 
the cut-out image, which falls in the region inside of the view window having a predetermined size, is 
smaller than a certain threshold value. Specifically, in cases where chromaticity values at certain points of 
the cut-out image are spaced a distance larger than the predetermined threshold value on the chromaticity 
diagram from the chromaticity value of the candidate for the predetermined object image, the region 
constituted of these points is not extracted. In cases where chromaticity values at certain points of the cut- 
out image are spaced a distance smaller than the predetermined threshold value on the chromaticity 
diagram from the chromaticity value of the candidate for the predetermined object image, the region 
constituted of these points is extracted. 

With the aforesaid another alternative, in cases where the cut-out image is transformed with the 
25 complex-log mapping, the candidate for the predetermined object image can be extracted in the same 
manner as that when the extraction of the candidate for the predetermined object image is carried out in the 
Cartesian plane. 

Also, in the aforesaid another alternative, after a plurality of the regions are extracted, of the extracted 
regions, regions, which exhibit a high degree of coincidence in color with the candidate for the predeter- 
30 mined object image and which are located at positions close to one another, should preferably be caused to 
cooperate with each other and thereby emphasized. Also, a region, which exhibits a high degree of 
coincidence in color with the candidate for the predetermined object image, and a region, which exhibits a 
low degree of coincidence in color with the candidate for the predetermined object image and which is 
located at a position spaced apart from the region exhibiting a high degree of coincidence in color with the 
candidate for the predetermined object image, should preferably be caused to compete with each other, 
whereby the region, which exhibits a low degree of coincidence in color with the candidate for the 
predetermined object image, is erased. Also, regions, which exhibit a high degree of coincidence in color 
with the candidate for the predetermined object image and which are located at positions spaced apart from 
each other, should preferably be caused to compete with each other. In this manner, a region exhibiting a 
high degree of coincidence in color with the candidate for the predetermined object image, which region 
has a size and a shape appropriate for the region to be selected, is kept unerased. Also, a region exhibiting 
a high degree of coincidence in color with the candidate for the predetermined object image, which region 
has a size and a shape inappropriate for the region to be selected, is erased. Thus a region, which is most 
appropriate in the region inside of the view window, is selected as a target object image region. The 
45 azimuth and the distance of the selected region in the complex-log mapped plane are detected as the 
vector for the travel of the view window. In such cases, the region, which exhibits a high degree of 
coincidence in color with the candidate for the predetermined object image, can be extracted easily. 

When the candidate for the predetermined object image is extracted in the manner described above, it 
is possible to extract not only a candidate for a specific object image but also a candidate for a 
so predetermined object image having any shape. Also, even if the background of the candidate for the 
predetermined object image in the image is complicated, the candidate for the predetermined object image 
can be extracted appropriately. Additionally, processing need be carried out only for a limited part of the 
image, in which the candidate for the predetermined object image is embedded, and therefore the operation 
time can be kept short. 

55 As a further alternative, the extraction of the candidate for the predetermined object image may be 
carried out by cutting out a plurality of images, which fall in the region inside of the view window, at a 
plurality of times having a predetermined time difference therebetween, calculating the difference between 
contour lines of object images embedded in the plurality of the cut-out images, and detecting a movement 
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of a background in a vertical or horizontal direction in the region inside of the view window, the movement 
^ being detected from the calculated difference. At the same time, the images, which fall in the region inside 
of the view window, are transformed with the complex-log mapping into complex-log mapped images. The 
difference between contour lines of object images, which lines extend in the radial direction, is calculated 

5 from the complex-log mapped images, and a movement of the background in an in-plane rotating direction 
is thereby detected. Also, the difference between contour lines of object images, which lines extend in the 
annular direction, is calculated from the complex-log mapped images, and a movement of the background 
in the radial direction is thereby detected. Thereafter, the movement of the background is compensated for 
in accordance with the detected movement of the background in the vertical or horizontal direction, in the 

w in-plane rotating direction, and/or in the radial direction. A contour line of an object, which shows a 
movement different from the movement of the background, is detected from the image, in which the 
movement of the background has been compensated for. Azimuth vectors are then detected from 
components of the contour line, which are tilted at a predetermined angle with respect to the annular 
direction in the complex-log mapped plane. A vector is then composed from the azimuth vectors, and a 

75 vector for the travel of the view window is thereby determined. In this manner, the direction, to which the 
center point of the view window should travel, is determined. The extraction area for the extraction of the 
candidate for the predetermined object image is then determined in accordance with the size and/or the 
shape of the object, the center point of the view window being taken as a reference during the 
determination of the extraction area. 

20 In the manner described above, only the candidate for the predetermined object image can be 
extracted in cases where the candidate for the predetermined object image is moving in the region inside of 
the view window and in cases where the whole image, i.e., the background, is moving. Also, it is possible to 
follow up a candidate for the predetermined object image, which moves every moment, to find the 
candidate for the predetermined object image at the center point of the view window, and thereby to extract 

25 the candidate for the predetermined object image. Additionally, even if the background of the candidate for 
the predetermined object image in the image is complicated, the candidate for the predetermined object 
image can be extracted appropriately. Further, processing need be carried out only for a limited part of the 
image, in which the candidate for the predetermined object image is embedded, and therefore the operation 
time can be kept short. 

30 With the aforesaid further alternative, in cases where the cut-out image is transformed with the complex- 
log mapping, the candidate for the predetermined object image can be extracted in the same manner as 
that when the extraction of the candidate for the predetermined object image is carried out in the Cartesian 
plane. 

Also, in cases where the extraction of the candidate for the predetermined object image in accordance 
35 with its contour lines and the extraction of the candidate for the predetermined object image in accordance 

with its color are carried out simultaneously in the manner described above, the candidate for the 

predetermined object image can be extracted more accurately. 

The extraction of the candidate for the predetermined object image in accordance with its contour fines, 

the extraction of the candidate for the predetermined object image in accordance with its color, and the 
40 extraction of the candidate for the predetermined object image in accordance with the movement should 

preferably be carried out simultaneously. In such cases, the candidate for the predetermined object image 

can be extracted even more accurately. 

Further, as described above, the extraction of the candidate for the predetermined object image may be 

carried out by creating a map of the potential field of the whole image, from which the candidate for the 
45 predetermined object image is to be extracted, and determining an extraction area in accordance with the 

size and/or the shape of the candidate for the predetermined object image, a minimum point of the potential 

in the map being taken as a reference during the determination of the extraction area. In such cases, it is 

possible to extract not only a candidate for a specific object image but also a candidate for a predetermined 

object image having any shape. Also, even if the background of the candidate for the predetermined object 
so image in the image is complicated, the candidate for the predetermined object image can be extracted 

appropriately. 

Specifically, the vectors for the travel of the view window, which are determined from the contour lines, 
the color, and/or the movement, are taken as gradient vectors of a potential field. A map of the potential 
field of the whole image is created from the gradient vectors of the potential field. The extraction area is 
55 then determined in accordance with the size and/or the shape of the candidate for the predetermined object 
image by taking a minimum point of the potential in the map as a reference. In this manner, the minimum 
point in the candidate for the predetermined object image, i.e., the center point of the candidate for the 
predetermined object image can be found from the gradients of the potential field. Therefore, the candidate 
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for the predetermined object image can be extracted very accurately and efficiently. 

Also, with this method for the extraction of the candidate for the predetermined object image, it is 
possible to extract not only a candidate for a specific object image but also a candidate for a predetermined 
object image having any shape. Also, even if the background of the candidate for the predetermined object 
5 image in the image is complicated, the candidate for the predetermined object image can be extracted 
appropriately. 

Additionally, with this method for the extraction of the candidate for the predetermined object image, in 
cases where the cut-out image is transformed with the complex-log mapping, the map of the potential field 
can be created, and the candidate for the predetermined object image can be extracted in the same manner 
w as that when the extraction of the candidate for the predetermined object image is carried out in the 
Cartesian plane. 

Further, in cases where the judgment as to whether the candidate for the predetermined object image is 
or is not the predetermined object image is made from feature parts of the predetermined object image and 
the positions of the feature parts in the predetermined object image, an accurate judgment can be made as 

is to whether the extracted candidate for the predetermined object image is or is not the predetermined object 
image. Therefore, the performance of the system, in which the method for recognizing an object image in 
accordance with the present invention is employed, can be kept high. 

Specifically, with the method for recognizing an object image in accordance with the present invention, 
as described above, during the judgment as to whether the extracted candidate for the predetermined 

20 object image is or is not the predetermined object image, the learning means is caused to learn a plurality 
of feature patterns with respect to each of a plurality of feature parts of the predetermined object image. 
Judgments are then made as to whether feature parts of the candidate for the predetermined object image 
are or are not included in the plurality of the feature patterns with respect to each of the plurality of the 
feature parts of the predetermined object image, which feature patterns the learning means has learned. 

25 Thereafter, a judgment is made as to whether the relationship between the positions of the feature parts of 
the candidate for the predetermined object image coincides or does not coincide with the relationship 
between the positions of the feature parts of the predetermined object image. A judgment is thereby made 
as to whether the candidate for the predetermined object image is or is not the predetermined object image. 
In such cases, even if the feature parts of the candidate for the predetermined object image, on which a 

30 judgment is to be made, vary for different candidates for predetermined object images, the judgment as to 
whether the candidate for the predetermined object image is or is not the predetermined object image can 
be made accurately from the plurality of the feature patterns, which the learning means has learned. 

In cases where the learning operations of the learning means are carried out with the learning method 
for a neural network in accordance with the present invention by utilizing a neural network, in particular, by 

35 utilizing the Kohonen's self-organization, self-organization of a plurality of feature patterns is effected with 
the topological mapping, and the learning means can efficiently learn the plurality of feature patterns. 
Therefore, judgments as to whether feature parts of the predetermined object image are or are not included 
in feature parts of the candidate for the predetermined object image can be made efficiently regardless of a 
change in the angle of the object image and a difference among object images. 

40 Also, in cases where a neural network, in particular, the neocognitron, is utilized in order to make 
judgments as to whether feature parts of the candidate for the predetermined object image are or are not 
included in the plurality of the feature patterns, which the learning means has learned, and/or a judgment as 
to whether the relationship between the positions of the feature parts of the candidate for the predetermined 
object image coincides or does not coincide with the relationship between the positions of the feature parts 

45 of the predetermined object image. In such cases, the results of judgment are not affected by a shift in 
position of the candidate for the predetermined object image, and the performance of the system, in which 
the method for recognizing an object image in accordance with the present invention is employed, can be 
kept high. 

Additionally, in cases where a face image is taken as the predetermined object image, and right eye, 
so left eye, and mouth patterns are taken as the plurality of feature parts, on which the learning operations are 
to be carried out, a judgment as to whether a candidate for the face image is or is not the face image can 
be made regardless of a change in the facial expression, a shift in position of the candidate for the face 
image, or the like. Further, a candidate for the face image different from the face image, which has been 
utilized during the learning operations, can be judged as being the face image. 
55 With the learning method for a neural network in accordance with the present invention, a target object 
image, for which learning operations are to be carried out, is extracted from an image, and a signal, which 
represents the extracted target object image, is fed into a neural network. The learning operations of the 
neural network are then carried out in accordance with the input target object image. Therefore, the target 

9 



BNSDOCID: <EP 05320S3A2J_> 



EP 0 532 053 A2 



object image can be extracted automatically from an image, the extracted target object image can be 
^ classified in an arranged form, and the learning operations can thereby be carried out. Accordingly, a 
human being need not intervene in order to extract and normalize the target of the learning operations, and 
the learning operations can be carried out efficiently. 

5 The target object image, on which the learning operations are to be carried out. may be extracted in the 

same manner as that in the extraction of the candidate for the predetermined object image in the aforesaid 
method for recognizing an object image in accordance with the present invention. 

Also, the center point of the view window having a predetermined size may be caused to travel to the 
center point of the candidate for the predetermined object image, and the size and/or the shape of the 

w candidate for the predetermined object image may be normalized by taking the center point of a view 
window and a contour line of the candidate for the predetermined object image as reference. Thereafter, the 
normalized candidate for the predetermined object image may be extracted. In such cases, object images 
having different sizes and/or shapes can be extracted as those having approximately identical sizes and/or 
shapes. Accordingly, burden to a step, such as the judgment step or the learning step, which is carried out 

is after the extraction of the contour line of the object image, can be kept light. Also, the judgment and the 
learning operations can be carried out appropriately. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 Figure 1 is a block diagram showing the fundamental concept behind the method for recognizing an 
object image in accordance with the present invention, 

Figure 2 is a flow chart showing the method for recognizing an object image in accordance with the 
present invention, 

Figures 3A and 3B are explanatory views showing at what parts of a figure the human viewpoint 
25 stabilizes, 

Figure 4 is an explanatory view showing an example of a neural network for extracting a candidate for 
the predetermined object image in an embodiment of the method for recognizing an object image in 
accordance with the present invention, wherein a vector for the travel of a view window is calculated from 
a contour line of the candidate for the predetermined object image, 
30 Figures 5A, 5B, and 5C are explanatory views showing the results of transformation of various figures 
with the complex-log mapping, 

Figure 6 is an explanatory view showing how components, which are tilted at predetermined angles with 
respect to an annular direction, are extracted from a contour line extending in the annular direction, 
Figure 7 is an explanatory view showing how components, which are tilted at predetermined angles with 

as respect to the annular direction, are extracted from a contour line extending in the radial direction, 

Figure 8 is an explanatory view showing an ellayer, an f1 -layer, a g1 -layer, an hi -layer, and an i-layer of 
the neural network shown in Figure 4, which is employed in a first embodiment of the first step of the 
method for recognizing an object image in accordance with the present invention, 
Figure 9 is a graph showing weights applied to outputs of neurons, 

40 Figures 10A, 10B, 10C, and 10D are explanatory views showing complex-log mapped planes and 
corresponding Cartesian planes as an aid in explaining how a view window travels to the center point of 
a candidate for a predetermined object image, 

Figures 11 A, 1 1 B, 11C, and 11D are explanatory views showing complex-log mapped planes and 
corresponding Cartesian planes as an aid in explaining how a view window travels to a point of 

45 intersection of contour lines of a candidate for a predetermined object image, 

Figure 12 is an explanatory view showing how the direction of travel of a view window is determined in 
cases where the center point of the view window travels from the side inward from a candidate for a 
predetermined object image to a point of intersection of contour lines of the candidate for the 
predetermined object image, 

so Figure 13 is an explanatory view showing how the direction of travel of a view window is determined by 
applying a phase shift, 

Figure 14 is a block diagram showing the fundamental concept behind embodiments of a first step for 
extracting a candidate for a predetermined object image in accordance with a movement of the 
candidate for the predetermined object image in the method for recognizing an object image in 
55 accordance with the present invention, 

Figure 15 is an explanatory view showing part of a neural network, which part detects a movement of a 
background in an image, the neural network being employed to extract a candidate for a predetermined 
object image in accordance with a movement of the candidate for the predetermined object image in the 
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first step of the method for recognizing an object image in accordance with the present invention, 
^ Figures 16A and 16B are diagrams showing images at the time t and the time t + a, 

Figure 17 is an explanatory view showing a synaptic weights pattern for extracting the contour lines, 
Figures 18A and 18B are diagrams showing contour lines in the images at the time t and the time t + a, 
5 Figure 19 is a diagram showing the difference between the contour lines in the images at the time t and 
the time t + a, 

Figure 20 is an explanatory view showing a synaptic weights pattern for extracting upward movement 
components and a synaptic weights pattern for extracting leftward movement components, 
Figures 21 A and 21 B are diagrams showing images, in which the movement components of contour lines 
10 in a parallel direction have been extracted, 

Figures 22A and 22B are diagrams showing images at the time t and the time t + a, 

Figures 23A and 23B are diagrams showing images, which are obtained by transforming the images at 
the time t and the time t + a with complex-log mapping, 

Figures 24A and 24B are diagrams showing contour lines in the images, which have been obtained by 
;s transforming the images at the time t and the time t + a with complex-log mapping, 

Figure 25 is a diagram showing the difference between the contour lines in the images, which have been 
obtained by transforming the images at the time t and the time t + a with complex-log mapping, 
Figure 26 is an explanatory view showing a synaptic weights pattern for extracting components in an 
enlarging direction and a synaptic weights pattern for extracting components in a reducing direction, 
20 Figures 27A and 27B are diagrams showing images, in which the movement components of the contour 
lines in the radial direction have been extracted, 

Figures 28A and 28B are diagrams showing images at the time t and the time t + a, 

Figures 29A and 29B are diagrams showing images, which are obtained by transforming the images at 
the time t and the time t + a with complex-log mapping, 
25 Figures 30A and 30B are diagrams showing contour lines in the images, which have been obtained by 
transforming the images at the time t and the time t + a with complex-log mapping, 

Figure 31 is a diagram showing the difference between the contour lines in the images, which have been 
obtained by transforming the images at the time t and the time t + a with complex-log mapping, 
Figure 32 is an explanatory view showing a synaptic weights pattern for extracting components in a 
30 clockwise rotating direction and a synaptic weights pattern for extracting components in a counter- 
clockwise rotating direction, 

Figures 33A and 33B are diagrams showing images, in which the movement components of the contour 
lines in the rotating direction have been extracted, 

Figure 34 is a flow chart showing a process for canceling the movement of the background in an image, 
35 Figures 35A and 35B are diagrams showing images at the time t and the time t + a, in which the 
movement of the background has been canceled, 

Figure 36 is a diagram showing an image, in which the movement of the background has been canceled, 
Figure 37 is a block diagram showing the fundamental concept behind the extraction of a candidate for 
the predetermined object image, 

40 Figure 38 is an explanatory view showing an example of a neural network for determining a vector for the 
travel of the view window from a movement of a candidate for a predetermined object image, 
Figure 39 is an explanatory view showing an example of a neural network for extracting a candidate for a 
predetermined object image in accordance with a color of the candidate for the predetermined object 
image in an embodiment of the method for recognizing an object image in accordance with the present 

45 invention, 

Figure 40 is an explanatory view showing the results of transformation of various figures with the 
complex-log mapping, 

Figure 41 is an explanatory view showing an example of a neural network for detecting the level of 
coincidence in color, 

so Figure 42 is an explanatory view showing local area limited, interactive weighted connections for 
competition and cooperation, which are designed such that a region exhibiting a high degree of 
coincidence in color with a candidate for a predetermined object image may be selected, 
Figure 43 is a graph showing an example of weighting of an output of a neuron, 

Figure 44 is an explanatory view showing an e M -layer, a g M -tayer, and an i"-layer of the neural network 
55 shown in Figure 39, which is employed in the first embodiment of the first step of the method for 
recognizing an object image in accordance with the present invention. 

Figure 45 is a first explanatory view showing complex-log mapped planes and a corresponding Cartesian 
plane as an aid in explaining how a view window travels to the center point of a candidate for a 
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predetermined object image, 
, Figure 46 is a second explanatory view showing complex-log mapped planes and a corresponding 

Cartesian plane as an aid in explaining how a view window travels to the center point of a candidate for a 

predetermined object image, 
5 Figure 47 is a third explanatory view showing complex-log mapped planes and a corresponding 

Cartesian plane as an aid in explaining how a view window travels to the center point of a candidate for a 

predetermined object image, 

Figure 48 is an explanatory view showing how the center point of a view window travels on an image, in 
which a human image is embedded, 
w Figures 49A and 49B are explanatory views showing the results of transformation of large and small face 
images with the complex-log mapping, 

Figure 50 is an explanatory view showing how a candidate for a predetermined object image is 
normalized in an embodiment of the method for recognizing an object image in accordance with the 
present invention, 

is Figure 51 is an explanatory view showing the results of transformation of a long face image with the 
complex-log mapping, 

Figures 52A, 52B, 52C, and 52D are explanatory views showing how a candidate for a predetermined 
object image is extracted in an embodiment of the method for recognizing an object image in 
accordance with the present invention in cases where a view window stopped at a point of intersection of 

20 contour lines of the candidate for the predetermined object image, 

Figure 53 is an explanatory view showing an example of a neural network for making a judgment in an 
embodiment of the second step of the method for recognizing an object image in accordance with the 
present invention as to whether a candidate for a predetermined object image is or is not the 
predetermined object image, 

25 Figure 54 is an explanatory view showing a closed eye pattern and an open eye pattern, 

Figure 55 is an explanatory view showing the results of learning of a superposed eye pattern with the 
Kohonen's self-organization, 

Figure 56 is an explanatory view showing the results of learning of a half-open eye pattern with the 
Kohonen's self-organization, 

30 Figure 57 is an explanatory view showing a map of a potential field of an image, in which an elliptic 
image, a triangular image, and a square image are embedded, 
Figure 58 is an explanatory view showing gradients of a potential field, 

Figure 59 is an explanatory view showing an example of a neural network for extracting a candidate for a 
predetermined object image by moving the center point of a view window to the center point of the 
35 candidate for the predetermined object image in a second embodiment of the first step in the method for 
recognizing an object image in accordance with the present invention, 

Figures 60A and 60B are explanatory views showing weights of synaptic connections between layers of 
the neural network, which is employed in the second embodiment of the first step in the method for 
recognizing an object image in accordance with the present invention, 
40 Figures 61 A and 61 B are explanatory views showing how a difference in position between the center 
point of a view window and a candidate for a predetermined object image is detected in the second 
embodiment of the first step in the method for recognizing an object image in accordance with the 
present invention, 

Figures 62A and 62B are explanatory views showing weights of synaptic connections between layers of 
45 the neural network, which is employed for causing the center point of a view window to travel towards a 
point of intersection of contour lines of a candidate for a predetermined object image in the second 
embodiment of the first step in the method for recognizing an object image in accordance with the 
present invention, 

Figures 63A and 63B are explanatory views showing how a difference in position between the center 
so point of a view window and a candidate for a predetermined object image is detected by causing the 
center point of the view window to travel towards a point of intersection of contour lines of the candidate 
for the predetermined object image in the second embodiment of the first step in the method for 
recognizing an object image in accordance with the present invention, 

Figure 64 is an explanatory view showing an example of a neural network for extracting a candidate for a 
55 predetermined object image by causing the center point of a view window to travel towards the center 
point of the candidate for the predetermined object image in the second embodiment of the first step in 
the method for recognizing an object image in accordance with the present invention, 
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Figure 65 is an explanatory view showing weights of synaptic connections in a layer of the neural 
network, which is employed for causing the center point of a view window to travel towards a region 
approximately coinciding in color with a candidate for a predetermined object image in the second 
embodiment of the first step in the method for recognizing an object image in accordance with the 
5 present invention, 

Figure 66 is a block diagram showing the fundamental concept behind the learning method for a neural 
network in accordance with the present invention, and 
Figure 67 is a graph showing a chromaticity diagram. 

w DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention will hereinbelow be described in further detail with reference to the accompany- 
ing drawings. 

First, embodiments of the method for recognizing an object image in accordance with the present 
75 invention will be described hereinbelow. 

Figure 1 is a block diagram showing a fundamental concept behind the method for recognizing an 
object image in accordance with the present invention. In a first step 1, a candidate for a predetermined 
object image is extracted from a given image. Thereafter, in a second step 2, a judgment is made as to 
whether the candidate for the predetermined object image is or is not the predetermined object image. 
20 Figure 2 is a block diagram showing a fundamental concept behind embodiments of the extraction of a 
candidate for a predetermined object image, which is carried out in the first step 1 of the method for 
recognizing an object image in accordance with the present invention. In this embodiment, the extraction of 
the candidate for the predetermined object image is carried out by causing the center point of a view 
window which has a predetermined size, to travel to the position of the candidate for the predetermined 
25 object image in accordance with a contour line, a color, and a movement of the candidate for the 
predetermined object image, and determining an extraction area in accordance with the size and/or the 
shape of the candidate for the predetermined object image, the center point of the view window being taken 
as a reference during the determination of the extraction area. A neural network is utilized for the extraction 
of the candidate for the predetermined object image. 
30 As illustrated in Figure 2, this embodiment comprises a neural network processing unit 3 for search and 

a processing unit 4 for judgment as to the stop state of the view window. The neural network processing 
unit 3 for search iterates the travel of the view window and searches a candidate for the predetermined 
object image by utilizing a neural network. The processing unit 4 for judgment as to the stop state of the 
view window investigates the state of the travel of the view window. When the view window stops, the 
35 processing unit 4 makes a judgment as to whether the stop of the view window is or is not caused by the 
discovery of a candidate for the predetermined object image. When necessary, the processing unit 4 
extracts the candidate for the predetermined object image. 

Specifically, in the neural network processing unit 3 for search, the position of the center point of the 
view window and the size of the view window are set in the image, and part of the image falling in the 
40 region inside of the view window is fetched from the image. Thereafter, vectors for the travel of the view 
window are calculated from the movement, the contour lines, and the color of the candidate for the 
predetermined object image. The vectors for the travel of the view window are combined with one another, 
and a composite vector for the travel of the view window is thereby determined. The view window is then 
caused to travel on the image in accordance with the composite vector for the travel of the view window. In 
45 the processing unit 4 for judgment as to the stop state of the view window, the amount of travel of the view 
window is investigated. In cases where it is found that the view window is traveling, a signal for instructing a 
further travel of the view window is fed into the neural network processing unit 3 for search. In cases where 
it is found that the view window has stopped, a judgment is made as to the stop state of the view window. 
Specifically, a judgment is made as to whether the stop of the view window on the image is or is not 
so caused by the discovery of a candidate for the predetermined object image. In cases where the stop of the 
view window is not caused by the discovery of a candidate for the predetermined object image, it is not 
necessary for the view window to stay at the current stop position, and therefore a signal for setting a new 
position and a size of the view window is fed into the neural network processing unit 3 for search. Upon 
receiving this signal, the neural network processing unit 3 for search again causes the view window to 
55 travel, in cases where the stop of the view window is caused by the discovery of a candidate for the 
predetermined object image, the discovered candidate for the predetermined object image is cut out. i.e. is 
extracted, and a signal representing the extracted candidate for the predetermined object image is 
transferred to the second step 2 for the judgment shown in Figure 1 . 



13 



BNSDOCID: <EP „0532O53A2J_> 



EP 0 532 053 A2 



In the neural network processing unit 3 for search, the neural network is constructed such that the view 
window may travel in the same manner as that of the movement of the human viewpoint. It has been 
reported from past studies that, in cases where a human being watches a figure shown in Figure 3A or 3B, 
his viewpoint stays for a long time at a contour or an end point of the object. In the case of Figure 3B, the 
5 human viewpoint stays at parts S surrounded by the broken lines. Also, experience has shown that, at the 
instant at which a human being recognizes an object as a single mass, his viewpoint is not on the contour 
line of the object, but stays at the center part surrounded by the contour line. For example, when a human 
being watches the figure shown in Figure 3A, his viewpoint stays at a part S surrounded by the broken line. 
Specifically, it can be said that the human viewpoint becomes stable at the center point of an object 
w surrounded by a contour line, at the point of intersection of lines, or at a vertex of a polygon. Therefore, the 
neural network should be provided with functions such that, in cases where an object image is present only 
at a peripheral region of the view window, the view window may be moved to a position closer to the object 
image, and thereafter the center point of the view window may be moved to the center point, a vertex, or 
the like, of the object image, the view window being thereby rendered stable. 
is How the vector for the travel of the view window is calculated from the contour line of an object in the 

neural network processing unit 3 for search will be described hereinbelow. 

Figure 4 is an explanatory view showing an example of a neural network for calculating the vector for 
the travel of the view window in accordance with the contour line of the candidate for the predetermined 
object image. A signal, which represents an image of the outer world fed through the view window at the 
20 time t, is fed into the neural network. Also, the output obtained from the neural network represents the 
amount and the direction in which the view window is to be moved. The output is considered as a two- 
dimensional vector and is hereinbelow referred to as the vector for the travel of the view window. The 
magnitude of the vector for the travel of the view window represents the level of instability at the center 
point of the view window. The direction of the vector for the travel of the view window represents the 
25 direction in which the view window should travel in order to become more stable. 

The basic structure of the neural network employed in this embodiment is of the layered type. However, 
the neurons in each of e1- and e2-layers are connected with each other in an interactive manner. Therefore, 
the neural network may also be referred to as a complex type. Also, in a back propagation model, which is 
popular as the layered type, every neuron in each layer is connected to every neuron in the adjacent layer. 
30 In contrast to the back propagation model, the neural network employed in this embodiment is of a local 
area limited connection type, in which the connections are effected only between those neurons which are 
located at corresponding positions in adjacent layers or between the neurons located in the vicinity of these 
corresponding neurons. Therefore, the number of the synapses between the layers can be reduced 
markedly, and the operation time can be kept short. Additionally, connections between the neurons in each 
35 interactive connection layer are effected only between those neurons the positions of which are close to one 
another. Such a local area limited connection structure is similar to the structure of the nervous circuit 
ranging from the retina of the human eye to the primary visual area of the brain. 

The layers of the neural network employed in this embodiment have the functions described below. 
This neural network is constituted of an a-layer for feeding a signal representing the image into the neural 
40 network, and a b-layer for transforming the input image with the complex-log mapping (logarithmic polar 
coordinate transformation). The neural network is also constituted of a d -layer for detecting contour lines, 
which extend in the annular direction, and a c2-layer for detecting contour lines, which extend in the radial 
direction. The neural network is also constituted of a d-layer for detecting the end points of the contour lines 
extending in the radial direction, the e1 -layer which is an interactive connection circuitry layer for selecting a 
45 contour line extending in the annular direction, and the e2-layer which is an interactive connection circuitry 
layer for selecting a contour tine extending in the radial direction. The neural network is additionally 
constituted of an fl -layer for detecting the differences in position between components of the contour line of 
the candidate for the predetermined object image, which line extends in the annular direction, and the 
center point of the view window, and an f2-layer for detecting the differences in position between 
so components of the contour line of the candidate for the predetermined object image, which line extends in 
the radial direction, and the center point of the view window. The neural network is further constituted of a 
g1 -layer for detecting the azimuth vectors of the candidate for the predetermined object image with respect 
to the annular direction, and a g2-layer for detecting the azimuth vectors of the candidate for the 
predetermined object image with respect to the radial direction. The neural network is also constituted of an 
55 hi -layer for combining the detected azimuth vectors with respect to the annular direction, an h2-layer for 
combining the detected azimuth vectors with respect to the radial direction, and an i-layer for generating an 
output representing a vector for the travel of the view window. In the neural network employed in this 
embodiment, the synaptic connections between adjacent ones among the b-layer and those that follow are 
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space invariant. Also, the synaptic connections in each interactive connection layer are space mvanant The 
space invariance is adopted in order to facilitate the model simulation and parallel processing, and the 
synaptic connections need not necessarily be space invariant. However, in cases where the synaptic 
connections are space invariant, the output of each neuron becomes equivalent to an output obta.ned when 
5 the result of an operation of convolution (i.e.. matching with a synaptic weights pattern) of the synaptic 
connection matrix and the preceding neuron layer matrix is passed through a nonlinear function. Therefore 
computational processing becomes markedly easy when simulation of the neural network .s earned out with 

3 C °Fir P sMhe signal representing the image is fed into the neural network. In the b-layer, the input image is 
,o transformed with the complex-log mapping by taking the center point of the view window as the pole of 
mapping. Because the image is thus transformed with the complex-log mapping in the b-layer. the 
operations in the b-layer and the layers that follow in the neural network can be processed computations ly 
as the convolution between the arrangement of the image information and a space invariant synaptic 
connection information arrangement. 
)5 Figures 5A 5B. and 5C show the results of transformation of various figures with the complex-log 
mapping. With the complex-log mapping, in cases where the pole of mapping, i.e.. the center point of the 
view window, is present at the center point of a candidate for the predetermined object image, concentric 
circle-like curved lines shown in Figure 5A are transformed into horizontal straight lines. Also, radial lines 
shown in Figure 5B are transformed into vertical straight lines, and a triangle is transformed into the pattern 

Sh ° wherTJpSnt W(x.y) in the Cartesian plane before being transformed is mathematically expressed with 

a complex number as 



25 



z = x + iy (1) 

a complex-log mapped point W* is represented by the formula 
W' = ln(z) = ln<|z|) + je 2 (2) 
30 where 

Z = (x 2 + =y2)1* (3) 

e z = tan- 1 (y/x) (4) 



35 



40 



45 



50 



55 



Specifically, with the complex-log mapping, transformation is carried out such that the d.stance axis of he 
complex-log mapped plane may represent the logarithmic value of the distance from the center point of the 
view window, and the azimuth axis may represent the angle around the center point of the view window 

The information representing the complex-log mapped image is fed into the d-layer and the c2-layer. 
In the d -layer and the c2-layer, contour lines extending in the annular direction and in the rad.al d.rect.on 
are extracted respectively in the complex-log mapped plane. 

The term "an annular direction in a complex-log mapped plane" as used here.n means the d.rect.on 
along the azimuth axis of the complex-log mapped plane. Specifically, the azimuth axis of the complex-log 
mapped plane represents the angle made around the origin of the Cartesian plane (i.e.. around the center 
point of the view window in the present invention). A circle having its center point at the orig.n in he 
Cartesian plane is expressed as a straight line, which is parallel to the azimuth axis, in the complex-log 
mapped plane. Therefore, the direction along the azimuth axis of the complex-log mapped plane is referred 
to as the annular direction. Also, the distance axis of the complex-log mapped plane represents the d.stance 
from the origin of the Cartesian plane. A radial straight line, which passes through the ongin. in he 
Cartesian plane is expressed as a straight line, which is parallel to the distance axis, in the complex-log 
mapped plane. Therefore, the direction along the distance axis of the complex-log mapped plane is referred 

to as the radial direction. . 

In cases where a candidate for the predetermined object image is blocked by a different body, which is 
located on the foreground side, end points of the contour lines extending in the radial direction occur at he 
boundary between the candidate for the predetermined object image and the different body located on the 
foreground side. In the d-layer. such end points of the contour lines extending in the radial direction are 
extracted from the pattern received from the c2-layer. in which the contour lines extending m the rad.al 
direction have been extracted. The e1 -layer and the e2-layer are the interactive connection layers hav.ng 
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the local area limited, interactive weighted connections such that a contour line extending in the annular 
direction and a contour line extending in the radial direction, respectively, may be selected. In the e1 -layer, 
of the contour lines extending in the annular direction, which have been detected by the d -layer, lines 
having a high level of continuity in the annular direction and lines having a high intensity are emphasized, 
whereas independent or weak lines are erased. In the e2-layer, of the contour lines extending in the radial 
direction, which have been detected by the c2-layer, lines having a high level of continuity in the radial 
direction and lines having a high intensity are emphasized, whereas independent or weak lines are erased. 

Thereafter,. in the f1 -layer, components, which are tilted at predetermined angles with respect to the 
annular direction, are extracted from the contour line extending in the annular direction, which has been 
selected by the e1 -layer taking on the form of the interactive connection layer. Specifically, as illustrated in 
Figure 6, a contour line 51 extending in the annular direction is transmitted as a signal weighted with a 
synaptic' weights pattern 52 for extracting the components, which are tilted upwards to the right, and a 
synaptic weights pattern 53 for extracting the components, which are tilted upwards to the left. The synaptic 
weights patterns 52 and 53 for extracting the tilted components strengthen parts of the contour line 
15 extending in the direction, along which the " + " components are arrayed, and cancel parts of the contour 
line covering both the " + " and components. Therefore, when the contour line 51 extending in the 
annular direction is transmitted as a signal weighted with the synaptic weights patterns 52 and 53, 
components 51a of the contour line 51, which are tilted upwards to the right, and components 51b of the 
contour line 51, which are tilted upwards to the left, are extracted. In this embodiment, the components of 
the contour line extending in the annular direction, which are tilted at angles ranging from 20 to 30 degrees 
with respect to the annular direction in the complex-log mapped plane, are extracted by the synaptic 
weights patterns 52 and 53 for extracting the tilted components. 

Also, in the f2-layer, components, which are tilted at predetermined angles with respect to the annular 
direction, are extracted from the contour line extending in the radial direction, which has been selected by 
the e2-layer taking on the form of the interactive connection layer. Specifically, as illustrated in Figure 7, a 
contour line 54 extending in the radial direction is transmitted as a signal weighted with a synaptic weights 
pattern 55 for extracting the components, which are tilted upwards to the left, and a synaptic weights pattern 
56 for extracting the components, which are tilted upwards to the right. Components 54a of the contour line 
54, which are tilted upwards to the left, and components 54b of the contour line 54, which are tilted upwards 
30 to the right, are thereby extracted. In this embodiment, the components of the contour line extending in the 
radial direction, which are tilted at angles ranging from 60 to 70 degrees with respect to the annular 
direction in the complex-log mapped plane, are extracted by the synaptic weights patterns 55 and 56 for 
extracting the tilted compon -As. 

How the vector for the travel of the view window is determined from the components of the contour 
lines, which have been detected in the manner described above, will be described hereinbelow. As an aid in 
facilitating the explanation, how the vector for the travel of the view window is composed by operations in 
the e1 -layer and those that follow will be described hereinbelow. 

Figure 8 is an explanatory view showing in more detail the e1 -layer, the f1 -layer (constituted of MA- and 
f IB-layers), the g1 -layer (constituted of g1A- and g IB-layers), the hi -layer, and the i-layer of the neural 
network shown in Figure 4. which is employed in the first embodiment of the first step of the method for 
recognizing an object image in accordance with the present invention. 

As illustrated in Figure 8, the respective layers are connected to one another through synaptic 
connections. The e1-layer and the f1 -layer have the neurons arrayed two-dimensionally. The distribution of 
excited neurons in the e1 -layer correspond to the components of the contour line of the candidate for the 
45 predetermined object image, which tine extends in the annular direction and which has been selected by 
the interactive synaptic connections in the e1 -layer. 

Weight factors of synaptic connections 62, through which the excitatory signals are transmitted from the 
e1 -layer to the MA-layer, are set such that, of the distribution of excitation in the e1 -layer, only the 
components 61, which are tilted upwards to the left, may be transmitted easily. Therefore, in Figure 8, in the 
so MA-layer, only a group of neurons 63, which correspond to the positions of the tilted components 61, are 
excited. Thereafter, in order for an azimuth vector to be detected, excitatory signals are transmitted from the 
MA-layer to the g1A-layer. The g1A-layer is constituted of one-dimensionally arrayed neurons and is 
connected to the MA-layer through synaptic connections so as to receive the signals from the group of 
neurons, which are present in the same azimuth in the MA-layer. As a result, in the g1A-layer, a neuron 64, 
which corresponds to the azimuth of the group of the neurons 63, is excited. The azimuth of the neuron 64 
and the intensity of its excitation represent an azimuth vector. The MA-layer and the g1A-layer should 
preferably be connected to each other such that the azimuth of the group of the neurons 63 and the 
azimuth of the neuron 64 may shift at a predetermined angle 0 from each other, i.e., the phases of the 
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□roup of the neurons 63 and the neuron 64 may shift from each other. Alternatively, as illustrated in Figure 
9 weights W may be applied to the connections between the f1A-layer and the glA-layer such that the 
weights of the connections from the neurons of the f1A-layer, which neurons correspond to positions remote 
from the view window, may be larger than the weights of the connections from the neurons of the f1 A-layer 
5 which neurons correspond to position closer to the view window. Both the phase shift and the weights W 
should more preferably be employed together. 

In the same manner as that described above, excitatory signals of the components 60 tilted upwards to 
the right are transmitted from the f1B-layer to the g1B-layer. In the g1B-layer. a neuron 66 representing an 
azimuth vector is excited. In this case. too. the phase shift fi and/or the weights W should preferably be 

^Thereafter in the hi -layer, the azimuth vectors, which have been detected by the g1A-layer and the 
qlB-layer in the manner described above, are combined with each other. In the i-layer. the vector for the 
travel of the view window is determined. The composition of the vector for the travel of the view window 
may be carried out in the polar coordinate system or in the Cartesian plane. In Figure 8. the vector for he 

,s travel of the view window is composed in the Cartesian plane. The excitatory signals com.ng from the 
neurons 64 and 66 are transformed inversely to the complex-log mapping and transmitted to the .-layer In 
the i-layer the center of gravity of each azimuth vector is found and is fed out as the vector for the travel of 
the view window. The inputs into the i-layer are described above with respect to the detection and 
composition of the azimuth vector directed to the center point of the major object image. However, actua ly. 

20 azimuth vectors directed to a point of concentration of the contour of the major object .mage are also 
entered into the i-layer and are combined appropriately as will be described later. 

How the view window travels towards the candidate for the predetermined object image in accordance 
with the vector for the travel of the view window, which has been determined in accordance with the contour 
line of the candidate for the predetermined object image in the manner described above, w.ll be descr.bed 

Fiqures 'lOA 10B 10C. and 10D are explanatory views showing complex-log mapped planes and 
corresponding Cartesian planes as an aid in explaining how the center point of a view window travels from 
the side outward from a candidate for a predetermined object image to the center point of the candidate for 
the predetermined object image. In this example, as an aid in facilitating the explanation, the candidate for 
30 the predetermined object image is circular, and the center point of the view window travels towards the 
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As illustrated in Figure 10A, when part of a candidate 71 for the predetermined object image comes into 
a view window 73, components of a contour line of a complexlog mapped candidate 71 for he 
predetermined object image, which are tilted at predetermined angles + a and -a in the complex-log 
mapped plane shown in Figure 10A, are detected by the f1 -layer. The predetermined angles + a and -« are 
the angles of the tilted components of the contour line extending in the annular direction which are 
extracted by the fl -layer, with respect to the annular direction. The angle of the tilt upward to the right with 
respect to the annular direction is taken as a positive angle, and the angle of the tilt upward to the left with 
respect to the annular direction is taken as a negative angle. The components tilted at the predetermined 
angles and -a are detected by the g1-layer as azimuth vectors 74. 74 (which correspond to azimuth 
vectors 72 72 in the Cartesian plane). For the purposes of facilitating the travel of the center po.nt Q of the 
view window 73 toward the center point O of the candidate 71 for the predetermined object .mage, a phase 
shift of + /S is applied to the azimuth vector, which corresponds to the component tilted at the predeter- 
mined angle +«. and a phase shift of -/J is applied to the azimuth vector, which corresponds to he 
component tilted at the predetermined angle -a. Also, weights in accordance with the distances between the 
center point of the view window and the components tilted at the predetermined angles +a and -a are 
applied to the azimuth vectors, which correspond to the components tilted at the predetermined angles +a 
and -a In this manner, azimuth vectors 74', 74' in the Cartesian plane are obtained. In the hi -layer, a 
composite vector is obtained from the azimuth vectors 74', 74'. The composite vector is determined as a 
vector 75 for the travel of the view window 73. When the vector 75 for the travel of the v.ew window 73 is 
thus determined, the view window 73 travels in accordance with the vector 75 for the travel of the v,ew 
window 73 Figure 10B shows the state of the view window 73 which has thus traveled. 

As illustrated in Figure 10B, the center point Q of the view window 73 and the candidate 71 for the 
predetermined object image are brought to positions comparatively close to each other, and the whole 
candidate 71 for the predetermined object image comes into the region inside of the view window 73. In this 
state two components tilted at the predetermined angle +a and two components tilted at the predeter- 
mined angle -a are detected. Therefore, four azimuth vectors 74. 74, 74. 74 are detected. In this case, as m 
the case shown in Figure 10A. the phase shift of *fi or -js and weights are applied to each azimuth vector. 



17 



OMfirxV.IO- <FP 0532053A2.I_> 



EP 0 532 053 A2 



and a vector 75 for the travel of the view window 73 is thereby determined. The center point Q of the view 
window 73 travels in accordance with the vector 75 for the travel of the view window 73. Figure 10C shows 
the state of the view window 73 which has thus traveled. 

In the state shown in Figure 10C, a vector 75 for the travel of the view window 73 is determined in the 

5 same manner as that described above, and the center point Q of the view window 73 travels in accordance 
with the vector 75 for the travel of the view window 73 into the state shown in Figure 10D. Ultimately, the 
position of the center point Q of the view window 73 coincides with the position of the center point 0 of the 
candidate 71 for the predetermined object image, and the center point Q of the view window 73 stops. 

The value of 0 for the phase shift is set at a value, which falls within the range of 0 degree to less than 

w 90-« degrees, such that the vector 75 for the travel of the view window 73 may be directed to the candidate 
71 for the predetermined object image. (If the value of fi for the phase shift is larger than 90-a degrees, the 
vector 75 for the travel of the view window 73 will be directed to the side opposite to the candidate 71 for 
the predetermined object image.) 

The weights applied to the azimuth vectors 74 and 72 are set in accordance with the distances between 

is the center point Q of the view window 73 and the components of the contour line tilted at the predetermined 
angles +a and -a. A larger weight is applied for a longer distance such that the vector 75 for the travel of 
the view window 73 may be directed to the center point O of the candidate 71 for the predetermined object 
image. 

As for the value of a for the aforesaid predetermined angle, in cases where the center point Q of the 

20 view window 73 is located in the region outside of the candidate 71 for the predetermined object image, the 
a point will certainly be present if a falls within the range of 0 degree to 90 degrees, (when a is 90 degrees, 
the azimuth vector comes into contact with the candidate 71 for the predetermined object image, and only a 
single a point is found.) Conversely, in cases where the center point Q of the view window 73 is located 
within the region inside of the candidate 71 for the predetermined object image, no a point will be present. 

25 depending on the distance between the center point Q of the view window 73 and the center point O of the 
candidate 71 for the predetermined object image and on the value of a. Experiments revealed that, in cases 
where the candidate 71 for the predetermined object image is circular as in this example, when a is 25 
degrees, the a point becomes absent if the distance between the center point Q of the view window 73 and 
the center point O of the candidate 71 for the predetermined object image becomes equal to approximately 

30 60% of the radius of the candidate 71 for the predetermined object image. When a is 45 degrees, the a 
point becomes absent if the distance between the center point Q of the view window 73 and the center 
point O of the candidate 71 for the predetermined object image becomes equal to approximately 80% of 
the radius of the candidate 71 for the predetermined object image. Therefore, when the value of a is small, 
the center point Q of the view window 73 can travel to the center point O of the candidate 71 for the 

35 predetermined object image. However, actually, the candidate for the predetermined object image is not a 
true circle but has one of various shapes. Accordingly, the value of a should preferably be set in 
accordance with the candidate for the predetermined object image which is to be extracted. Experience 
shows that, as for a natural image, the value of a should preferably fall within the range of approximately 20 
to 30 degrees. 

40 In cases where the center point Q of the view window 73 and the center point O of the circular 
candidate 71 for the predetermined object image coincide with each other, as illustrated in Figure 5A, the 
contour line of the candidate 71 for the predetermined object image, which line is included in the center 
part of the view window, is expressed as a straight line parallel to the annular direction in the complex-log 
mapped plane. Specifically, moving the center point of the view window to the center point of the candidate 

45 for the predetermined object image is equivalent to operating such that the contour line extending in the 
annular direction, which line is selected by the e1 -layer, may be constituted of as many components parallel 
to the annular direction as possible. 

How the vector for the travel of the view window is determined by the f2-, g2-, and h2-layers and how 
the view window travels in accordance with the vector for the travel of the view window will be described 

so hereinbelow. 

Figures 11 A, 1 1 B, 11C, and 11D are explanatory views showing complex-log mapped planes and 
corresponding Cartesian planes as an aid in explaining how a view window travels to a point of intersection 
of contour lines of a candidate for a predetermined object image. 

As illustrated in Figure 1 1 A, when part of a candidate 76 for the predetermined object image comes into 
55 a view window 73, components of a contour line of a complex-log mapped candidate 76' for the 
predetermined object image, which are tilted at predetermined angles +a and -a in the complex-log 
mapped plane shown in Figure 11 A, are detected by the f2-layer. The predetermined angles + «' and -a' 
are the angles of the tilted components of the contour line extending in the radial direction, which are 
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extracted by the f2-layer. with respect to the annular direction. The angle of the tilt upward to the right with 
respect to the annular direction is taken as a positive angle, and the angle of the tilt upward to the left with 
respect to the annular direction is taken as a negative angle. The components tilted at the predetermined 
angles + «• and -a' are detected by the g2-layer as azimuth vectors 77, 77. 77, 77. For the purposes of 
facilitating the travel of the center point Q of the view window 73 toward the point of intersect.on T of the 
contour lines of the candidate 76 for the predetermined object image, a phase shift of +0 is applied to the 
azimuth vector, which corresponds to the component tilted at the predetermined angle +a\ and a phase 
shift of -0' is applied to the azimuth vector, which corresponds to the component tilted at the predetermined 
anqle Also weights are applied to the azimuth vectors, which correspond to the components tilted at the 
predetermined angles +«' and -a'. In this manner, azimuth vectors 77\ 77\ 77\ 77' in the Cartesian plane 
are obtained. In the h2-layer, a composite vector is obtained from the azimuth vectors 77'. 77'. 77 , 77 . The 
composite vector is determined as a vector 78 for the travel of the view window 73, and the information 
representing the composite vector is fed out from the i-layer. When the vector 78 for the travel of the view 
window 73 is thus determined, the view window 73 travels in accordance with the vector 78 for the travel of 
the view window 73. Figure 1 1 B shows the state of the view window 73 which has thus traveled. 

In the state shown in Figure 11B. a vector 78 for the travel of the view window 73 is determined in the 
same manner as that described above, and the center point Q of the view window 73 travels in accordance 
with the vector 78 for the travel of the view window 73 into the state shown in Figure 11C and the state 
shown in Figure 11D. Ultimately, the position of the center point Q of the view window 73 coincides with the 
point of intersection T of the contour lines of the candidate 76 for the predetermined object image, and the 
center point Q of the view window 73 stops. 

As for the value of a' for the predetermined angle, in cases where the angle of the vertex is small, and 
at the same time the value of a' is small, two +«• points and two -a' points are present. However, if the 
value of a' is larger than a certain value, only a single +«' point (and only a single point) will be present. 
If the value of a' is less than 90 degrees, at least a single «' point will be present. Under such a condition 
that two a' points are present, a vector merely composed from the azimuth vectors directed from the center 
point of the view window 73 to the two a' points serves as the vector for the travel of the view window 73 
towards the vertex. However, problems will occur if only a single «' point is present or if no such point is 
present. Experiments have shown that, if the value of «■ for the predetermined angle is small, the « pom 
becomes absent as the vertex angle of the object image becomes larger. Therefore, in order to ensure that 
the view window 73 travels towards the vertex regardless of what value of angle the vertex may have, it is 
first necessary that the «' point is present. Accordingly, the value of «' for the predetermined angle should 
preferably be close to 90 degrees. 

As illustrated in Figure 12, in cases where the center point Q of the view window and the vertex he on a 
single horizontal line, the vertex angle extends symmetrically upwards and downwards with respect to the 
horizontal line, and only a single «' point is present, the vector 78 for the travel of the view window which 
vector is composed from the azimuth vectors 77. 77. is directed to the side opposite to the vertex. In order 
to eliminate such a problem, as illustrated in Figure 13. phase shifts of *0 and ■& should be appl.ed such 
that the vector 78 for the travel of the view window may be directed to the vertex. For th.s purpose, the 
40 phase shifts of *fi and -fi should be applied such that the azimuth vectors 77. 77 may be brought at least 
to the side rightward from a perpendicular 79. which passes through the center point Q of the view window. 
The value of 0' is set so as to satisfy the conditions 

0" > a' - C/2, and 
45 0 < j8' < 90° 

where C represents the value of the angle of the vertex to be detected and satisfies the condition of 0 < C 

< 360° 

By' the application of the appropriate phase shift and weighting, the center point Q of the view window 
73 can be moved to the point of intersection of the contour lines of the candidate 76 for the predetermined 
object image regardless of at what angle the contour lines may intersect with each other. 

In cases where the center point Q of the view window 73 and the vertex of the candidate 76 for the 
predetermined object image coincide with each other, as illustrated in Figure 5B. the contour lines of the 
candidate 76 for the predetermined object image, which is cut out in accordance with the view window 73. 
are expressed as straight lines parallel to the radial direction in the complex-log mapped plane. Specifically, 
moving the center point of the view window to the vertex of the candidate for the predetermined object 
image is equivalent to operating such that the contour line extending in the radial direction, which line is 
selected by the e2-layer. may be constituted of as many components parallel to the radial direction as 
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possible. 

In the i-layer, the vector for the travel of the view window, which vector has been obtained in the h1- 
layer, and the vector for the travel of the view window, which vector has been obtained in the h2-layer, are 
combined appropriately. The information representing the composite vector for the travel of the view 

5 window is fed out of the i-layer. How the i-layer functions will be described hereinbelow. 

it often occurs that the vector for the travel of the view window, which vector has been obtained in the 
hi -layer, and the vector for the travel of the view window, which vector has been obtained in the h2-layer, 
become different from each other, depending on the circumstances. For example, in cases where the center 
point of the view window is located in the region outside of the predetermined object image, the aforesaid 

w two types of the vectors for the travel of the view window are directed to the same direction. In cases where 
the center point of the view window is located in the immediate vicinity of the predetermined object image 
or in the region inside of the predetermined object image, the aforesaid two types of the vectors for the 
travel of the view window become different from each other. Therefore, when the outputs of the neural 
network, which are determined by the hi- and h2-layers, are utilized in a well-balanced manner, the view 

75 window can be moved in accordance with a desired predetermined object image. For example, in cases 
where a characteristic shape of part of a predetermined object image is to be detected and utilized for 
learning operations, or the like, much importance may be attached in the i-layer to the output obtained from 
the h2-layer. In cases where the whole predetermined object image is to be found, much importance may 
be attached in the i-layer to the output obtained from the hi -layer. 

20 At the same time as when the vector for the travel of the view window is calculated from the contour 
line of the candidate for the predetermined object image in the manner described above, a vector for the 
travel of the view window is also calculated from a movement of the candidate for the predetermined object 
image. 

In the case of a moving image, there is a strong probability that a moving object with respect to the 
25 background will constitutes a candidate for the predetermined object image. Therefore, the candidate for the 

predetermined object image can be extracted accurately by canceling the background, extracting only the 

moving object, and causing the view window to travel. 

How the vector for the travel of the view window is calculated from the movement of an object in the 

neural network processing unit 3 for search will be described hereinbelow. 
30 Figure 14 is a block diagram showing a fundamental concept behind the calculation of a vector for the 

travel of the view window from the movement of an object. In this embodiment, a neural network is utilized 

in order to extract a candidate for a yacht image, which serves as a candidate for the predetermined object 

image, from an image. 

As illustrated in Figure 14, in this embodiment, in a step A in the neural network, the position of the 
35 center point of the view window is set in the image, and part of the image falling in the region inside of the 
view window is fetched from the image. Thereafter, in a step B, a movement of the image, i.e., a movement 
of the image in an in-plane parallel direction, an in-plane rotating direction, and/or a radial direction, is 
detected. In a step C, the movement of the background is canceled (compensated for). Thereafter, in a step 
D, an object, which exhibits a movement different from the movement of the background, i.e., a candidate 
40 for the predetermined object image, is detected. In a step E, the view window is caused to travel to the 
object that exhibits the different movement. 

Movements of an image correspond to the three types of movements described betow. 

(1) A movement, which is parallel to the image plane, in a plane, which is normal to the line of vision. 
(This movement corresponds to an eyeball movement or a movement of an image input device, such as 

45 a camera, in a direction which is normal to the line of vision.) 

(2) A movement of enlargement or reduction around the viewpoint (This movement corresponds to a 
movement of an image input device in a direction heading to or away from the outer world along the line 
of vision.) 

(3) A clockwise or counter-clockwise rotation movement around the line of vision. (This movement 
50 corresponds to a rotation movement of an image input device around the line of vision.) 

Any movement can be classified into one of these three movements. 

Figure 15 is an explanatory view showing an example of a neural network for detecting the movement of 
the image. 

A signal, which represents part of an image of the outer world and which is obtained through the view 
55 window having a predetermined size, is fed into the neural network. Also, a signal representing the direction, 
in which the background in the image moves, is fed out of the neural network. The aforesaid parallel 
movement is space invariant in the Cartesian plane. Therefore, the detection of the parallel movement 
should preferably be carried out in the Cartesian plane. On the other hand, the enlargement or reduction 

20 
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movement and the rotation movement are space variant in the Cartesian plane but can be processed as 
being space invariant after being transformed with the complex-log mapping. Therefore, the detection of the 
enlargement or reduction movement and the rotation movement should preferably be carried out in the 
complex-log mapped plane. 

s The layers of the neural network employed in this embodiment have the functions described below. 

This neural network is divided into a neural network 10, which detects a parallel movement of the 
background, and a neural network 20, which detects an enlargement or reduction movement and a rotation 
movement. The basic structure of the neural network, which is employed in this embodiment, is of the 
layered type. The neural network 10 for detecting a parallel movement of the background is constituted of 
io an A1 -layer provided with the view window having a predetermined size, in which layer the image is cut out 
with the size of the view window and fed into the neural network, and an A2-layer for imparting a time lag to 
the cut-out image. The neural network 10 is also constituted of a C1 -layer for detecting the contour lines of 
an object image, which is embedded in the image having been cut out in the A1 -layer, and a C2-layer for 
detecting the contour lines of the object image, which is embedded in the image having been imparted with 
75 the time lag in the A2-layer. The neural network 10 is additionally constituted of a D1 -layer for finding the 
difference between the image, which has been obtained from the detection of the contour lines in the C1- 
layer, and the image, which has been obtained from the detection of the contour lines in the C2-layer. The 
neural network 10 is further constituted of E1-, E2-, E3-, and E4-layers for extracting only the components in 
predetermined directions from the difference, which has been found in the D1 -layer. The neural network 10 
20 is also constituted an F1 -layer for generating an output, which represents the movement of the background 
as the azimuth of a parallel movement, from the components of the contour lines having been extracted in 
the E1-, E2-, E3-, and E4-layers. 

On the other hand, the neural network 20 is constituted of an A3-layer provided with the view window 
having a predetermined size, in which layer the image is cut out with the size of the view window and is fed 
25 into the neural network, and an A4-tayer for imparting a time lag to the cut-out image. The neural network 
20 is also constituted of a B1 -layer for transforming the image, which has been cut out in the A3-layer, with 
the complex-log mapping, and a B2-layer for transforming the image, which has been imparted with the 
time lag in the A4-layer, with the complex-log mapping. The neural network 20 is additionally constituted of 
a C3-layer for detecting the contour lines of an object image, which is embedded in the complex-log 
30 mapped image obtained in the B1 -layer, and a C4-layer for detecting the contour lines of the object image, 
which is embedded in the complex-log mapped image obtained in the B2-layer. The neural network 20 is 
further constituted of a D2-!ayer for finding the difference between the image, which has been obtained from 
the detection of the contour lines in the C3-layer, and the image, which has been obtained from the 
detection of the contour lines in the C4-layer. The neural network 20 also has E5- and E6-layers for 
35 extracting only the components in the radial direction from the difference having been found in the D2-layer. 
The neural network 20 additionally has E7- and E8-layers for extracting only the components in the annular 
direction from the difference having been found in the D2-layer. The neural network 20 further has an F2- 
layer for generating an output, which represents the movement of the background as the enlargement or 
reduction movement, i.e., as the azimuth of a radial movement, from the components of the contour lines 
40 having been extracted in the E5- and E6-layers. The neural network 20 also has an F3-layer for generating 
an output, which represents the movement of the background as the azimuth of a rotation movement, from 
the components of the contour lines having been extracted in the E7- and E8-layers. 

In the neural network employed in this embodiment, the synaptic connections between adjacent ones 
among the B-layer and those that follow are space invariant. Also, the synaptic connections in each of these 
45 layers are space invariant. The space invariance is adopted in order to facilitate the model simulation and 
parallel processing, and the synaptic connections need not necessarily be space invariant. However, in 
cases where the synaptic connections are space invariant, the output of each neuron becomes equivalent to 
an output obtained when the result of an operation of convolution (i.e., matching with a synaptic weights 
pattern) of the synaptic connection matrix and the preceding neuron layer matrix is passed through a 
so nonlinear function. Therefore, computational processing becomes markedly easy when simulation of the 
neural network is carried out with a computer. In the neural network employed in this embodiment, positive 
neuron signals and positive and negative synaptic connections are used. Specifically, when a positive signal 
is transmitted through a positive connection, it is transmitted as a positive excitatory signal. When a positive 
signal is transmitted through a negative connection, it is transmitted as a negative excitatory signal. 
55 However, the neuron signals are not limited to positive signals, and a neuron model may be employed 
wherein neuron signals include positive and negative excitatory signals, synaptic connections include 
positive and negative connections and wherein, when a negative signal passes through a negative 
connection, it is transmitted as a positive excitatory signal. 
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The neural network 10 has the functions described below. First, in the A1 -layer, an image 11A at the 
time t. which is shown in Figure 16A and which falls in the region inside of the view window having a 
predetermined size, is cut out. Also, in the A2-layer, an image 11 B at the time t + a, which is shown in 
Figure 16B and which falls in the region inside of the view window having the predetermined size, is cut out. 
To the image 1 1B at the time t + a, a movement has been added which is parallel to the image plane and 
which is reverse to a movement of an image input device, such as a camera. As compared with the image 
11 A, the image 11B has moved upwardly (the image input device has moved downwardly). In Figures 16A 
and'l6B, the white parts in each image represent that the signal intensity is high, and black parts in each 
image represent that the signal intensity is low. Thereafter, in the C1- and C2-layers, contour lines of object 
images, which are embedded in the images 11A and 11 B having been cut out in the A1- and A2-layers, are 
detected. Specifically, the images 11A and 1 1B are transmitted as signals weighted with a synaptic weights 
pattern 12 for extracting the contour lines, which is shown in Figure 17. In this manner, the contour lines in 
the images 11 A and 11 B are detected, which contour lines are indicated by images 13A and 13B shown in 
Figures 18A and 18B. 

Thereafter, in the D1 -layer, an operation is carried out in order to find the difference between the image 
13A, which has been obtained from the detection of the contour lines in the C1 -layer, and the image 13B, 
which has been obtained from the detection of the contour lines in the C2-layer. Specifically, the operation 
expressed as (image 13A - image 13B) is carried out. In this manner, an image 14 shown in Figure 19 is 
obtained. Specifically, the D1 -layer receives neuron signals with positive weights from the C1 -layer and 
receives neuron signals with negative weights from the C2-!ayer. The hatched parts in Figure 19 indicate 
that the signal value is zero. Thereafter, in the E1-, E2-, E3-. and E4-layers, the image 14 is transmitted as a 
signal weighted with synaptic weights patterns for detecting movements parallel to the image plane, and 
components representing the respective directions of the movements are extracted. As an aid in facilitating 
the explanation, in this example, only two types of components of movements, i.e., the components in the 
25 upward direction and the components in the leftward direction, are extracted by using a synaptic weights 
pattern 15 for extracting the upward components and a synaptic weights pattern 16 for extracting the 
leftward components, which patterns are shown in Figure 20. An image 17A shown in Figure 21 A and an 
image 17B shown in Figure 21 B respectively indicate the results of the transmission of the image 14 as the 
signal weighted by the synaptic weights pattern 15 and the synaptic weights pattern 16. Information 
representing the components of the movements of the contour lines, which components have been 
extracted in the E1-, E2-, E3-. and E4-layers, is fed into the Fl-layer. Thereafter, the portion of the neural 
network constituted of F1-, FV-. and F"-layers detects which direction of movement components were 
extracted more, and an output is generated which represents the azimuth of the movement in the direction 
parallel to the image plane. Specifically, in this embodiment, as indicated by the image 17A shown in Figure 
35 21 A, the upward components have the highest signal intensity. Therefore, a signal having a high intensity is 
fed out from a neuron which represents that the upward movement occurred, and no signal is fed out from a 
neuron which represents that the leftward movement occurred. 
The neural network 20 has the functions described below. 

First, how the neural network 20 works when an image input device, such as a camera, moves towards 

40 the image and cuts out the image will be described hereinbelow. 

First, in the A3-layer, an image 21 A at the time t, which is shown in Figure 22A and which falls in the 
region inside of the view window having a predetermined size, is cut out. Also, in the A4-layer, an image 
21 B at the time t + a, which is shown in Figure 22B and which falls in the region inside of the view window 
having the predetermined size, is cut out. To the image 21 B at the time t + a, a movement causing the 

45 image to enlarge has been added by the movement of an image input device, such as a camera, towards 
the outer world. As compared with the image 21A. the image 21B has been enlarged. Thereafter, in the Bl- 
and B2-layers, the images 21 A and 21 B having been cut out in the A3- and A4-layers are transformed with 
the complex-log mapping. In this manner, images 22A and 22B shown in Figures 23A and 23B are 
obtained. 

so Thereafter, in the C3- and C4-layers, contour lines of object images, which are embedded in the images 
22A and 22B obtained from the transformation with the complex-log mapping in the B1- and B2-layers, are 
detected. Specifically, the images 22A and 22B are transmitted as signals weighted by the synaptic weights 
pattern 12 for extracting the contour lines, which is shown in Figure 17. In this manner, the contour lines in 
the images 22A and 22B are detected, which contour lines are indicated by images 23A and 23B shown in 

55 Figures 24A and 24B. 

Thereafter, in the D2-layer, an operation is carried out in order to find the difference between the image 
23A, which has been obtained from the detection of the contour lines in the C3-layer, and the image 23B, 
which has been obtained from the detection of the contour lines in the C4-layer. Specifically, the operation 
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expressed as (image 23A - image 23B) is carried out. In this manner, an image 24 shown in Figure 25 is 
Sed Thereafter, in the E5-and E6-layers. the image 24 is transmitted as a signal weighted by synaptic 
weights patterns for detecting movements in the radial direction, and components representing the 
respective directions of the movements are extracted. Specifically, components of the *• 

; image in the radia. direction, which movements are due to the enlargement and reduction ^^ d b J 
using a synaptic weights pattern 25 for extracting the components in the enlarging direct.on and a synapt c 
weights pattern 26 for extracting the components in the reducing direction, which patterns are shown n 
Figure 26 An image 27A shown in Figure 27A and an image 27B shown in Figure 27B respectively .nd-cate 
the resu.ts of the transmission of the image 24 as the signal weighted by the synaptic weights pattern 25 

o and the synaptic weights pattern 26. Information representing the components of the movements of I he 
contour lines which components have been extracted in the E5- and E6-layers. .s fed into l he Flayer 
Thereafter, the portion of the neura. network constituted of F2, F2«, and F2"-layers detects whic . direction 
of movement components were extracted more, and an output is generated wh.ch represents the azimuth of 
L radial movement. Specifically, in this embodiment, as indicated by the image 27A shown . . Figure 27A 

5 and fhe mage 27B shown in Figure 27B. the intensities of the components in the enlarging direc on are 
high whereas the intensities of the components in the reducing direction are low. Therefore a signal having 
a high intensity is fed out from a neuron which represents that the movement in the enlarging direct on 
occurred, and no signal is fed out from a neuron which represents that the movement m the reducing 

, 0 '""How thTneural network 20 works when an image input device, such as a camera, rotates around the 
center point of its view window and fetches the image will be described herembelow. tha 
F rst in the A3-layer, an image 31 A at the time t, which is shown in Figure 28A anc wh.ch falls -n the 
region nside of the Jew window having a predetermined size, is cut out. Also, in the A4-.ayer. an -mage 
31B at the tU t + «. which is shown in Figure 28B and which falls in the region inside of the v.ew w.ndow 

J5 having he predetermined size, is cut out. To the image 31 B at the time t + a rotation h« been 

added by the rotation movement of an image input device, such as a came* As compa^d wrth the .image 
31A the image 31B has been rotated clockwise around the center po.nt of the v.ew w.ndow. Thereafter m 
the BV and B2-layers, the images 31 A and 31 B having been cut out in the A3- and A4-.ayers are 
transformed with the complex-log mapping. In this manner, images 32A and 32B shown in ^ures 29A and 

30 29B are obtained. Thereafter, in the C3- and C4-.ayers, contour lines of object .mages, which are -"bedded 
in the images 32A and 32B obtained from the transformation with the complex-log mapping ,n the B1- -and 
B2-layers, are detected. Specifically, the images 32A and 32B are transmitted as signa s weighted by the 
synaptic weights pattern 12 for extracting the contour lines, which is shown .n Figure 17 In this manner, the 
contour lines in the images 32A and 32B are detected, which contour lines are .ndicated by images 33A 

" ™ d Z^^^™^ is carried out in order to find the d.erence ™™ 
33A which has been obtained from the detection of the contour lines in the C3-layer. and the .mage 33B. 
whth has been obtained from the detection of the contour lines in the C4,ayer. SP^y^o^ 
tressed as (imaqe 33A - image 33B) is carried out. In this manner, an .mage 34 shown in Figure 31 .s 
40 X£ ^-rSrSZ «n me E7-and ES-llyers. the image 34 is transmitted as a signal ^^"iZXe 
weights patterns for detecting movements in the rotating direct.on. and components representing he 
Tespec ive rotating directions are extracted. Specifically, components of the rotation movements of the 
mage in the clockwise and counter-clockwise directions are extracted by using a ^^^^Z 
3 Tfor extracting the components in the clockwise rotating direction and a synaptic we.ghts pattern 36 for 
< 5 SrStJTS components in the counter-clockwise rotating direction, which patterns -e shown in Figure 
32 An image 37A shown in Figure 33A and an image 37B shown in Figure 33B respectively indicate the 
Suits of the transmission of the image 34 as the signal weighted by the synapt.c weights pattern 35 i and 
the synaptic weights pattern 36. Information representing the components of the movements o thec°ntour 
ines which components have been extracted in the E7- and E8-.ayers. is fed into the F3-layer. Thereafter, 
50 he portion of the neural network constituted of F3-. F3'-, and F3"-.ayers detects which rotating direction o 
movement components were extracted more, and an output is generated which represents the az.muth of 
me movement in the in-plane rotating direction. Specifically, in this embodiment, as .ndicated by the image 
37A shown in Figure 33A and the image 37B shown in Figure 33B. the intensities of the components ,n the 
clockwise rotating direction are high, whereas the intensities of the components ,n the counter-clockw.se 
55 Satng direction are low. Therefore, a signal having a high intensity is fed out from a neuron wh.ch 
repTsenfs that the clockwise rotation movement occurred, and no signal is fed out from a neuron wh.ch 
represents that the counter-clockwise rotation movement occurred. 
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The detection of the movement of the image from the components of the difference contour lines in the 
F1-, FV-, FT'-layers, the F2-, F2'-, F2 M -layers and the F3-, F3'-, F3"-layers, can be achieved by repeatedly 
feeding inputs into a perceptron type of neural network, giving instructor signals, which represent correct 
results of judgment, to the neural network, and thereby carrying out leaning operations of the neural 
5 network. The neural network can thus caused to grow such that it can detect movements of images. In this 
manner, outputs representing correct results of detection of movements of images can be obtained from 
difference contour lines of various images. 

Thereafter, in the step C, the difference components representing the movement of the whole image, 
i.e., the background, between the two images taken up with the time difference a can be canceled 
to (compensated for) in accordance with the movement of the whole image, which has been detected in the 
manner described above. Specifically, a movement reverse to the detected movement may be added in 
accordance with the direction and the amount of the detected image movement. Figure 37 is a flow chart 
showing the process for canceling the movement of the background. For example, the signal S1, S2 or S3, 
which represents the results of the detection of the movement, is added in a direction, which is reverse to 
75 the direction of the detected movement, to the signal representing the image 11B, 21 B, or 31 B shown in 
Figure 16B, 22B, or 28B. In this manner, an image, wherein the movement of the whole image, i.e.. the 
background, has been canceled, can be obtained from each image. 

Thereafter, in the step D, an operation is carried out to find the difference between the image at the 
time t + a, from which the movement of the whole image, i.e., the background, has been canceled in the 
20 step C, and the image at the time t. In this manner, only the object, which shows a movement different from 
the movement of the background, is extracted. Specifically, as illustrated in Figure 35A, an image 41 A is 
obtained by extracting the contour line components from the image at the time t, in which the movement of 
the background has been canceled. Also, as illustrated in Figure 35B, an image 41 B is obtained by 
extracting the contour line components from the image at the time t + a, in which the movement of the 
25 background has been canceled. The difference between the images 41 A and 41 B is found, and an image 
42 shown in Figure 36 is thereby obtained. The object embedded in the image 42, i.e., the yacht in the 
image in this embodiment, is the object showing a movement different from the movement of the 
background. This object is taken as the candidate for the predetermined object image. 

Thereafter, in the step E, the view window is caused to travel towards the candidate for the 
30 predetermined object image, which candidate shows the movement different from the movement of the 
background. Figure 37 is a flow chart showing the process for carrying out the travel of the view window in 
the step E. In the step E, in cases where a plurality of objects different from the background, i.e., a plurality 
of candidates for the predetermined object image, have been detected in the step D, they are narrowed 
down, and a candidate for the predetermined object image, which is most likely to be the predetermined 
35 object image, is thereby selected. Thereafter, the view window is caused to travel towards the selected 
candidate for the predetermined object image. 

Figure 38 is an explanatory view showing an example of a neural network for calculating a vector for the 
travel of the view window from the contour line of the candidate for the predetermined object image. 
Elements constituting this neural network are equivalent to those constituting the neural network shown in 
40 Figure 4, and are therefore numbered with corresponding primed reference numerals in Figure 38. The 
signal, which represents the image 42 shown in Figure 36, i.e., the image representing the contour lines of 
the object (the candidate for the predetermined object image) showing a movement different from the 
movement of the background, is fed into the neural network shown in Figure 38. Also, the output obtained 
from the neural network represents the amount and the direction, in which the view window is to be moved, 
45 i.e., the vector for the travel of the view window. 

At the same time as when the vectors for the travel of the view window are calculated from the contour 
line of the candidate for the predetermined object image and the movement of the candidate for the 
predetermined object image in the manner described above, a vector for the travel of the view window is 
also calculated from the color of the candidate for the predetermined object image, 
so Figure 39 is an explanatory view showing an example of a neural network for determining a vector for 
the travel of a view window from the color of a candidate for a predetermined object image. 

As in the embodiment of the method in accordance with the present invention wherein the vector for the 
travel of the view window is calculated from the contour line, a signal, which represents an image of the 
outer world at the time t and which is obtained through the view window having a predetermined size, is fed 
55 into the neural network. Also, the output obtained from the neural network represents the amount and the 
direction in which the view window is to be moved. 

As in the neural network for calculating the vector for the travel of the view window from the movement 
of the candidate for the predetermined object image and the neural network for calculating the vector for the 
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travel of the view window from the contour line of the candidate for the predetermined object image, the 
basic structure of the neural network in this embodiment is of the layered type. However, the neurons in an 
e"-layer are connected with each other in an interactive manner. Therefore, the neural network may also be 
referred to as a complex type. 

The layers of the neural network employed in this embodiment have the functions described below. As 
in the embodiment, wherein the vector for the travel of the view window is calculated from the movement of 
the candidate for the predetermined object image, and the embodiment, wherein the vector for the travel of 
the view window is calculated from the contour line of the candidate for the predetermined object image, 
this neural network is divided into the three steps. 

The step for setting the position of the center point of the view window in the image and fetching part of 
the image falling in the region inside of the view window is constituted of an a"-layer provided with the view 
window having a predetermined size, in which layer the image is cut out with the size of the view window 
and fed into the neural network, and a b"-layer for transforming the input image with the complex-log 
mapping (logarithmic polar coordinate transformation). 
;s The step for narrowing a candidate for the predetermined object image down from the input image is 
constituted of c"-, d*\ and e"-layers for detecting neurons, which fall within a region approximately 
coinciding in color with the candidate for the predetermined object image, and detecting the degree of 
coincidence in color between the region and the candidate for the predetermined object image. The step is 
also constituted of an r-layer, which is an interactive connection circuitry layer for selecting a region that 
20 exhibits a high degree of coincidence in color with the candidate for the predetermined object image and 
that is closer in the shape and the size to the predetermined object image. 

The step for detecting a shift in position between the candidate for the predetermined object image and 
the view window and calculating the amount of travel of the view window is constituted of a g"-layer for 
detecting the azimuth and the distance of the region of the group of the neurons, which fall within a region 
25 approximately coinciding in color with the candidate for the predetermined object image, and an i M -layer for 
generating an output representing a vector for the travel of the view window in accordance with the detected 
azimuth and the detected distance. As in the aforesaid embodiments, in the neural network employed in this 
embodiment, the synaptic connections between adjacent ones among the b"-layer and those that follow are 
space invariant. Also, the synaptic connections in the interactive connection layer are space invariant. 
30 First, in the a"-layer, part of the image falling within the region of the view window having the 
predetermined size is cut out. In the a"-layer, the density of the neurons provided is higher at part closer to 
the center point of the view window. In the b"-layer, the image, which has been fetched in the a M -layer, is 
transformed with the complex-log mapping by taking the center point of the view window as the pole of 
mapping. Because the image is thus transformed with the complex-log mapping in the b"-Iayer, the 
35 processes in the b"-layer and the layers that follow can be achieved by the convolution with a space 
invariant synaptic weights pattern. 

Figure 40 shows the results of transformation of various figures with the complex-log mapping, which is 
carried out in the b"-layer of the neural network shown in Figure 39. With the complex-log mapping, in 
cases where a candidate for the predetermined object image is located near the pole of mapping, i.e., the 
40 center point of the view window; the candidate for the predetermined object image is transformed into a 
comparatively large size. In cases where a candidate for the predetermined object image is located at a 
position remote from the pole of mapping, i.e., the center point of the view window, the candidate for the 
predetermined object image is transformed into a comparatively small size. The neurons of the b M -layer are 
excited in accordance with the intensity of the transformed image pattern. 
45 The portion of the neural network constituted of the c*\ d M -, and e M -layers, which portion represents an 
excitation pattern of the neurons corresponding to the complex-log mapped image, extracts the region 
approximately coinciding in color with the candidate for the predetermined object image, and detects the 
degree of coincidence in color between the extracted region and the candidate for the predetermined object 
image. The neurons in the e"-layer are excited in accordance with the detected degree of coincidence in 
50 color. The synaptic connections for extracting the neurons falling within the region approximately coinciding 
in color with the candidate for the predetermined object image are determined through learning operations 
by a back propagation method, or the like. 

In this embodiment, by way of example, a neural network having a three-layered structure may be 
employed as the portion of the aforesaid neural network for detecting the degree of coincidence in color 
55 with the candidate for the predetermined object image. Such an example will hereinbelow be described in 
detail. 

Figure 41 is an explanatory view showing in more detail the c"-layer ( the delayer, and the e"-layer of 
the neural network shown in Figure 39, which is employed in the embodiment of the first step of the method 
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for recognizing an object image in accordance with the present invention. 

The the portion constituted of the c"-layer, the delayer, and the e"-layer may be regarded as a three- 
layered neural network, wherein the c"-layer serves as an input layer, the d"-layer serves as an 
intermediate layer, and the e"-layer serves as an output layer. 

5 The signal fed into the c"-layer is the color information received from each of the neurons of the 

preceding neuron layer. In the c"-layer, the color information received from the preceding neuron layer is 
separated into the three primary colors of light (red, green, and blue), and a single neuron receives each 
piece of the color information. Therefore, in the c"-layer, three neurons are allocated to a single picture 
element. The signal fed into the c'*-layer is weighted by the synaptic connections and transmitted to the d"- 

w layer. The weights of the synaptic connections from the c"-layer to the delayer are represented by W1ij, 
where i represents the neurons of the c"-layer and j represents the neurons of the delayer. 

The d"-layer is the neuron layer referred to as the intermediate layer. In general, the number of 
elements is selected appropriately in accordance with the level of complexity of the process to be carried 
out. In this example, five neurons are allocated to a single picture element. Therefore, the signal 

75 transmission from thee neurons to five neurons per picture element is carried out by 15 synaptic 
connections. 

Signal transmission from the delayer to the e M -layer is then carried out. The e M -layer corresponds to 
the output layer of the three-layered neural network. This neural network detects the degree of coincidence 
in color with the candidate for the predetermined object image, and the degree of coincidence in color 

20 between a certain picture element and the candidate for the predetermined object image can be repre- 
sented as the level of excitation of a single neuron. Therefore, a single neuron is allocated to each picture 
element. Accordingly, the signal transmission from the d"-tayer to the e"-layer is carried out by five 
synaptic connections per picture element. The weights of the synaptic connections from the delayer to the 
e"-layer are represented by W2ij, where i represents the neurons of the delayer and j represents the 

25 neurons of the e"-layer. 

Actuatly, the neurons and the synaptic connections are provided in accordance with the number of 
picture elements. The signal transmission from the c"-layer to the e*'-layer can be carried out simulta- 
neously for the respective picture elements. In this neural network, the processes to be carried out are 
identical for the respective picture elements. Therefore, the characteristics of the neurons and the weights of 

30 the synaptic connections are identical for the respective picture elements. 

The synaptic connections for carrying out the operations for detecting the degree of coincidence in 
color with the candidate for the predetermined object image can be obtained through learning operations of 
the neural network. By way of example, Table 1 shows the values of synaptic weights, which may be 
employed in cases where the flesh color of a human face is taken as the color of the candidate for the 

35 predetermined object image and it is defined that the frequency distribution of the flesh color extracted from 
a certain population is proportional to the degree of coincidence in color. 
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The learning operations can be carried out by using a back propagat.on method F.rst s.gnals 
representing three primary color values of various colors, including the flesh color, are employed as ^nput 
S S signals representing the degrees of coincidence in color, which are determ.ned from the 
requency dist button of the flesh color, are emp.oyed as instructor signals, the , input »gn.ls and he 
nstruct^^nais are repeatedly given to the neural network. In accordance with d.fferences between the 
ns true o signals ^ actual outputs of the neural network, the weights of the synapt-c connect™, are 
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changed little by little such that the differences may become smaller. The learning operations are carried 
out in this manner until the neural network generates outputs identical with the instructor signals with 
respect to given inputs. 

The degrees of coincidence with the flesh color, which were obtained in accordance with the outputs of 
i this neural network, are indicated as color level lines in the chromaticity diagram illustrated in Figure 67. 
The degree of coincidence in color is 1.0 for a color, which is closest to the flesh color. The degree of 
coincidence in color decreases as a color becomes different from the flesh color. Operations for expressing 
the information about the three primary color values of a color is represented as x and y coordinates on the 
chromaticity diagram are carried out with the formulas 

x = (1 .1302 + 1 .6387r.'(r + g + b) + 0.621 5g/(r + g + b)/(6. 7846-3.01 57r.(r + g + b)-0.3857g/(r + g + b) (5) 

y = (0.0601 + 0.9399r/(r + g + b) + 4.5306g/(r + g + b)/(6. 7846-3.01 57r (r + g + b)-0.3857g/(r + g + b) (6) 

5 where r represents the red color value, g represents the green color value, and b represents the blue color 
value. 

By use of the weights of the synaptic connections determined from the learning operations in the 
manner described above, the degree of coincidence in color between the extracted region and the 
candidate for the predetermined object image is detected by the portion of the neural network constituted of 

io the c"-. d"-, and e"-layers. In this manner, the degree of coincidence in color can be detected as the level 
of excitation of the neuron of the e"-layer. 

The f"-layer is constituted of the interactive connection layer having local area limited, interactive 
weighted connections for competition and cooperation such that a region, which exhibits a high degree of 
coincidence in color with the candidate for the predetermined object image and which has a shape and a 

»s size most appropriate for the candidate for the predetermined object image, may be selected. As indicated 
by a synaptic weights pattern 80 shown in Figure 42, each neuron in the f"-layer is synapt.cally connected 
with a positive weight to a nearby neuron and with a negative weight to a slightly remote neuron. Excitatory 
signals are transferred through such interactive connections in the f"-layer. As a result, in the f"-layer, of the 
neurons which have received signals from excited neurons in the e"-layer that represent high degrees of 

so coincidence in color with the candidate for the predetermined object image, a region of a group of neurons 
having the predetermined shape and size and a high intensity of excitation, i.e., a high degree of 
coincidence in color, is selected and caused to fire, whereas neurons falling within an isolated region or a 
region exhibiting a low degree of coincidence in color with the candidate for the predetermined object 
image do not fire. By way of example, of regions 81 , 82, and 83 extracted by the e"-layer. which are shown 

35 in Figure 42, the degree of coincidence in color between the region 81 and the flesh color is assumed as 
being 1.0 in the aforesaid chromaticity diagram. Also, the degree of coincidence in color between the region 

82 and the flesh color is assumed as being 0.9, and the degree of coincidence in color between the region 

83 and the flesh color is assumed as being 0.2. In such cases, because the region 83 has a lower degree of 
coincidence in color with the flesh color than the regions 81 and 82, the region 83 is inhibited and erased 

40 when the region 83 and the regions 81, 82 are caused to compete with each other. Also, the region 82 has 
a lower degree of coincidence in color with the flesh color than the region 81. the region 82 is erased when 
the region 82 and the region 81 are caused to compete with each other. Additionally, because the region 81 
has a high degree of coincidence in color with the flesh color, the neurons corresponding to the region 81 
cooperate with one another, the level of excitation of these neurons becomes higher. Therefore, only the 

« region 81 is selected as a result of the competition and cooperation carried out in the f-layer. 

In the f'-layer, in cases where the degrees of coincidence in color between the regions 81 , 82, 83 and 
the flesh color are equal to one another, a region is selected which best conforms to the shape and the size 
of the synaptic weights pattern 80. For example, in cases where the shape and the size of the positive 
weight portion of the synaptic weights pattern 80 approximately coincide with the shape and the size of the 

so region 81, the regions 82 and 83 are erased during the competition and the cooperation, and only the 
region 81 is selected. 

Also, in the f"-layer, in cases where the degrees of coincidence in color between the regions 81, 8^. oo 
and the flesh color are equal to one another and the shapes and the sizes of these regions approximately 
coincide with one another, weights shown in Figure 43 are given to the neurons of the f'-layer. Specifically, 
55 weights W are applied such that the weight of connection from a neuron, which falls in an area of 
approximately 30% to 40% of the radius of the view window, may be largest. By such application of the 
weights, of the regions 81. 82. and 83 located at the positions shown in Figure 42. only the region 81 that is 
closest to the area of the largest weight is selected. 
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How a vector for the travel of the view window is determined from components of the firing region of the 
group of neurons exhibiting a high degree of coincidence in color with the candidate for the predetermined 
object image, which region has been selected in the manner described above, will be described herein- 

below. i r *u 

Figure 44 is an explanatory view showing in more detail the f'-layer, the g*'-layer, and the i -layer of the 
neural network shown in Figure 39, which is employed in this embodiment. 

As illustrated in Figure 44, the respective layers are connected to one another through synaptic 
connections. The f'-layer has the neurons arrayed two-dimensionally. The distribution of excited neurons in 
the r-layer corresponds to the components of the region, which has a high degree of coincidence in color 
and has the shape and the size most appropriate for the candidate for the predetermined object image. This 
region has been selected by the interactive synaptic connections in the f M -layer. 

In order for an azimuth vector to be detected, excitatory signals are transmitted from the f'-layer to the 
g"-layer The g"-layer is constituted of one-dimensionally arrayed neurons and is connected to the f'-layer 
through synaptic connections so as to receive the signals from the group of neurons, which are present in 
the same azimuth in the f"-layer. As a result, in the g"-layer, a neuron 86, which corresponds to the az.muth 
and the distance of the group of the neurons 85, is excited. The azimuth of the neuron 86 and the intensity 
of its excitation represent the azimuth and the distance. 

Thereafter, in the f'-layer, a vector is composed from the azimuth and the distance, which have been 
detected by the g"-layer in the manner described above, and the vector for the travel of the view window is 
thereby determined. The composition of the vector for the travel of the view window may be carried out in 
the polar coordinate system or in the Cartesian plane. In Figure 44, the vector for the travel of the v.ew 
window is composed in the Cartesian plane. The excitatory signal coming from the neuron 86 is 
transformed inversely to the complex-log mapping and transmitted to the if -layer. In the i"-layer, the center 
of gravity of each azimuth vector is found and is fed out as the vector for the travel of the v.ew window. 

In cases where the vector for the travel of the view window is to be determined with a neural network in 
which the aforesaid competition and the aforesaid cooperation are not carried out, it often occurs that a 
plurality of vectors are detected. In such cases, the weights shown in Figure 43, i.e., the weights according 
to the distance from the center point of the view window, may be applied to the neurons of the f '-layer such 
that a single vector for the travel of the view window may be selected. 

Also in cases where the competition and the cooperation are carried out and a plurality of vectors are 
detected the weights shown in Figure 43 may be applied to the neurons of the f'-layer such that a single 
vector for the travel of the view window may be selected. In this manner, even if a plurality of vectors for the 
travel of the view window are detected, the vector for the travel of the view window towards the most 
appropriate candidate for the predetermined object image can be detected. In cases where only a single 
vector for the travel of the view window could be detected, if the distance between the view window and the 
candidate for the predetermined object image is large, the magnitude of the vector for the travel of the view 
window becomes small. 

How the view window travels towards the candidate for the predetermined object image in accordance 
with the vector for the travel of the view window, which has been determined from the color of the candidate 
for the predetermined object image in the manner described above, will be described hereinbelow. 

Figures 45, 46, and 47 are explanatory views showing complex-log mapped planes and corresponding 
Cartesian planes as an aid in explaining how the center point of a view window travels from the side 
outward from a candidate for a predetermined object image to the side inward from the candidate for the 
predetermined object image. 

As illustrated in Figure 45, in cases where a candidate 71 for the predetermined object image does not 
include the center point Q of a view window 73, the candidate 71 for the predetermined object image is 
transformed as a single mass with the complex-log mapping and is detected as a region, which has a high 
degree of coincidence ink color with the candidate for the predetermined object image, in the complex-log 
mapped plane shown in Figure 45. In the f'-layer, which is an interactive connection circuitry layer for 
competition and cooperation, an operation for narrowing a candidate 71' for the predetermined object image 
down is carried out. As a result, a single candidate region or a plurality of candidate regions, which exhibit a 
high degree of coincidence in color with the candidate for the predetermined object image and have an 
appropriate size, are selected. (In this embodiment, a single such candidate region is selected.) Thereafter, 
a vector is composed from the azimuth and the distance of the selected region , and a vector 75 for the 
travel of the view window 73 is thereby obtained. The vector 75 for the travel of the view window 73 is 
directed to the candidate 71 for the predetermined object image. Therefore, when the view window 73 is 
moved in accordance with the magnitude of the vector 75 for the travel of the view window 73, the center 
point Q of the view window 73 can approach the candidate 71 for the predetermined object image. 
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As illustrated in Figure 46, the center point Q of the view window 73 and the candidate 71 for the 
predetermined object image are brought to positions which are comparatively close to each other, and the 
whole candidate 71 for the predetermined object image comes into the region inside of the view window 73. 
In such cases, the complex-log mapped candidate 71' for the predetermined object image is detected as a 

5 region, which extends in every azimuth and which has a high degree of coincidence in color with the 
candidate for the predetermined object image. In this state, the candidate 71 ' for the predetermined object 
image is narrowed down by the interactive connection circuitry for competition and cooperation. During the 
operation for narrowing the candidate 71' for the predetermined object image down, regions of the same 
candidate extending in all azimuths inhibit one another. As a result, only the group of neurons correspond- 

w ing to the azimuths, in which the distance between the outer edge of the candidate 7V for the 
predetermined object image and the center point Q of the view window 73 is largest, can be excited. A 
region 87, which includes this group of neurons, is detected. Thereafter, the azimuth and the distance are 
detected from the azimuth and the intensity of excitation of the detected region 87. In this manner, a vector 
75 for the travel of the view window 73 is obtained. The vector 75 for the travel of the view window 73 is 

/5 directed to the center point of the candidate 71 for the predetermined object image. Therefore, when the 
view window 73 is moved in accordance with the magnitude of the vector 75 for the travel of the view 
window 73, the center point Q of the view window 73 can even further approach the candidate 71 for the 
predetermined object image. 

Finally, as illustrated in Figure 47, the center point Q of the view window 73 and the candidate 71 for 

20 the predetermined object image are brought to approximately the same position. In such cases, the 
complex-log mapped candidate 71 1 for the predetermined object image is detected as a region, which 
extends uniformly in every azimuth and which approximately coincides in color with the candidate for the 
predetermined object image. In this state, the candidate 71' for the predetermined object image is narrowed 
down by the interactive connection circuitry for competition and cooperation. During the operation for 

25 narrowing the candidate 71 * for the predetermined object image down, regions of the same candidate 
extending uniformly in all azimuths compete with and inhibit one another. As a result, no region can be 
excited. Therefore, the magnitude and the distance detected next become zero. Also, the magnitude of the 
vector for the travel of the view window 73 becomes zero. In this case, during the operation for moving the 
view window 73 in accordance with the magnitude of the vector for the travel of the view window, the travel 

30 of the center point Q of the view window 73 stops at the center point of the candidate 71 for the 
predetermined object image. 

The neural network employed in this embodiment iterates the operations described above and 
ultimately moves the center point of the view window to the center point of the candidate for the 
predetermined object image. 

35 In the manner described above, the three types of the vectors for the travel of the view window are 
obtained from the neural network, wherein the vector for the travel of the view window is calculated from the 
movement of the candidate for the predetermined object image, the neural network, wherein the vector for 
the travel of the view window is calculated from the contour line of the candidate for the predetermined 
object image, and the neural network, wherein the vector for the travel of the view window is calculated from 

40 the color of the candidate for the predetermined object image. Thereafter, these three types of the vectors 
for the travel of the view window are combined with one another in a manner suitable for the candidate for 
the predetermined object image, which is to be found. In this manner, an ultimate vector for the travel of the 
view window is determined. The view window is caused to travel towards the candidate for the predeter- 
mined object image in accordance with the ultimately determined vector for the travel of the view window. 

45 The neural network employed in this embodiment iterates the operations described above and 
ultimately moves the center point of the view window to the center point of the candidate for the 
predetermined object image. In the processing unit 4 for judgment as to the stop state of the view window, 
when the center point of the view window coincides with the center point of the candidate for the 
predetermined object image, the candidate for the predetermined object image is cut out (i.e., extracted). 

so The information representing the candidate for the predetermined object image, which has thus been 
extracted, is fed into a judgment means. 

In the method for recognizing an object image in accordance with the present invention, the candidate 
for the predetermined object image can be cut out regardless of the size of the candidate for the 
predetermined object image and the positions of feature parts included in the candidate for the predeter- 

55 mined object image. However, such a way of cutting out the candidate for the predetermined object image 
results in a heavy burden to the next judgment step. Therefore, when a candidate for the predetermined 
object image is to be cut out, it should preferably be normalized such that the size of every candidate for 
the predetermined object image may become approximately the same and positions of feature parts may 
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be approximately the same among the candidates for the predetermined object image. In this manner, 
burden to the judgment means, which makes a judgment as to what the candidate for the predetermined 
object image is, can be reduced. Also, the time required for the operation can be kept short. 

Also in the embodiment described above, the size of the view window may be changed in accordance 

s with the size of the object, i.e., the candidate for the predetermined object image. For example, the posit.on 
of the contour line extending in the annular direction, which line has been detected in the aforesaid e1 -layer, 
with respect to the radial direction in the complex-log mapped plane may be detected. In this manner, the 
distance between the center point of the view window and the contour line of the candidate for the 
predetermined object image may be detected. Thereafter, the size of the view window may be changed in 

w accordance with the detected distance. 

By way of example, as illustrated in Figure 48. in cases where a human body image 88 is to be 
extracted from an image 89, in which the human body image 88 is embedded, the center point of the view 
window is first located at a point Q1. Thereafter, the center point of the view window travels through points 
Q2 Q3 and Q4 towards the human body image 88. The center point of the view window then stops at a 

; 5 center point Q5 of a face pattern 88a of the human body image 88. The face pattern 88a is normalized and 
extracted by the processing unit 4 for judgment as to the stop state of the view window. The information 
representing the extracted face pattern 88a is fed into the judgment means. In such cases, the term 
"normalization of a face pattern 88a" means the process, with which the face patterns having various 
shapes (e g patterns of large faces, small faces, and long faces) are converted into circular face patterns 

20 having approximately the same size by compression or enlargement such that the positions of eye and 
mouth patterns serving as feature parts of the face patterns may become approximately the same among 
various face patterns. 

How the candidate for the predetermined object image is normalized will be described herembelow. 
In cases where importance is attached to the f1 -layer of the neural network shown in Figure 4. the 
25 candidate for the predetermined object image is normalized in the manner described below. In this case, 
the viewpoint travels to and stops at the center point of the candidate for the predetermined object image. 
In cases where the center point of the view window traveled to the center point of a human face image 
serving as the candidate for the predetermined object image, as indicated at the right parts of Figures 49A 
and 49B with the complex-log mapping, the contour lines of a small face image 170 and the large face 
30 image 171 are transformed into straight lines 170' and 17V, which are located respectively at distances R1 
and R2 from the center point of the view window. Normalization is carried out by rendering the distances R1 
and R2 approximately equal to each other. For these purposes, the process described below is carried out. 
Figure 50 is an explanatory view showing how the normalization is carried out in this embodiment. 
As illustrated in Figure 50, in cases where the center point of the view window travels to the center point 
35 of the circular human face image, neurons 172 of the f1 -layer of the neural network are excited which are 
located in the annular direction at approximately equal distances from the center point of the view window. 
Along each of the rows of the array of the neurons of the f1 -layer, which rows regularly stand side by side 
in the radial direction, the outputs of the neurons arrayed in the annular direction are added to one another 
by synaptic connections 173. As a result, of the neurons 174 representing the sums of the additions, only 
40 the neuron 175, which is located at a distance Re from the center point of the view window, generates an 
output The distance Re represents the distance from the center point of the human face image to the 
contour line The distance Re is transformed inversely to the complex-log mapping into a distance In 'R9. 
An expansion and contraction factor k ( = L/Re) is then found, with which the distance In 'Re having been 
obtained from the transformation inverse to the complex-log mapping can be normalized to a predetermined 
45 value L When the human face image, which serves as the candidate for the predetermined object image, is 
extracted in the Cartesian plane, the extraction is carried out such that human face images having different 
sizes may become approximately the same size in accordance with the expansion and contraction factor k. 

The shape of a human face varies for different persons. As illustrated in Figure 51 , when the center 
point O of the view window traveled to the center point of a long elliptic face image 176. the contour line of 
so the long face image 176 is transformed with the complex-log mapping into a curve 176'. The distance 
between the center point of the view window and the curve 176' varies in accordance with the angle with 
respect to the annular direction (in this example, the maximum value is R3, and the minimum value is R4). 
In such cases, an expansion and contraction factor k ( = URe). with which the distance In 'Re having been 
obtained from the transformation inverse to the complex-log mapping can be normalized to the predeter- 
55 mined value L, is found for each angle with respect to the annular direction. When the long face .mage 176 
is extracted in the Cartesian plane, the extraction is carried out such that human face images having 
different shapes and sizes may become approximately the same shape and approximately the same size in 
accordance with the expansion and contraction factor k for each angle. 
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How the normalization is carried out in cases where an importance is attached to the output of the f2- 
layer will be described hereinbelow. As illustrated in Figure 52A, in this case, the center point 0 of a view 
window 178 travels to and stops at a point of intersection T of contour lines of a candidate 177 for a 
predetermined object image. The point of intersection T of the contour lines constitute a portion of the 

5 candidate 177 for the predetermined object image. In order to extract such a portion, information concerning 
the size of the entire candidate 177 for the predetermined object image is required. Therefore, in the 
processing unit 4 for judgment as to the stop state of the view window, the balance of the output 
composition is temporarily changed such that importance may be attached to the output of the fl -layer, and 
a vector for the travel of the view window towards the center point of the candidate for the predetermined 

;o object image 177 is determined. Also, searching of the center point of the candidate for the predetermined 
object image 177 is begun. 

At this time, the center point O of the view window is already at a portion of the contour lines of the 
candidate 177 for the predetermined object image. Therefore, if the candidate 177 for the predetermined 
object image has a size such that it is completely accommodated in the view window 178, the center point 

75 of the candidate 177 for the predetermined object image can be found reliably with the searching. 

However, as shown in Figure 52A, if the size of the candidate 177 for the predetermined object image is 
larger than the size of the view window 178, a signal representing the vector for the travel of the view 
window towards the center point of the candidate for the predetermined object image 177 cannot be 
obtained. Whether the signal representing the vector for the travel of the view window is or is not accurately 

20 obtained can be known by investigating the states of excitation of the neurons of the e1 -layer. Specifically, 
as illustrated in Figure 52A, in cases where no neuron is excited above a predetermined threshold value in 
the e1 -layer, it is judged that the entire candidate 177 for the predetermined object image is larger than the 
view window 178. In such cases, a signal for enlarging the region of the image of the outer world, which 
image is cut out by the a-layer, i.e., the view window 178, may be fed into the a-layer such that the entire 

25 candidate 177 for the predetermined object image can be found in the region inside of the view window 
178. As illustrated in Figure 52B, in cases where the entire candidate 177 for the predetermined object 
image comes into the region inside of an enlarged view window 178\ the neurons of the e1 -layer are 
excited, and a correct vector for the travel of the view window towards the center point of the whole 
candidate 177 for the predetermined object image is obtained. In this manner, as shown in Figure 52C, the 

30 neural network can cause the center point O of the view window 178* to travel towards the center point of 
the candidate 177 for the predetermined object image. 

Thereafter, in the same manner as that described above, the size R5 of the candidate 177 for the 
predetermined object image is calculated from the distance between the center point O of the view window 
178' and the contour line extending in the annular direction. Thereafter, as illustrated in Figure 52D, the 

35 center point O of the view window is returned to the point of intersection T of the contour lines, which point 
was found previously. A region, which has its center point at the point of intersection T of the contour lines 
and has a predetermined size ratio to the size of the entire candidate 177 for the predetermined object 
image, (in this example, the region having a radius of k'ln^RS, where k represents the expansion and 
contraction factor) is extracted. 

40 In the manner described above, the face image, which serves as the candidate for the predetermined 
object image, is normalized and extracted. The signal representing the candidate for the predetermined 
object image is fed into the judgment means for judging whether the candidate for the predetermined object 
image is or is not the predetermined object image. 

How a judgment is made in the second step of the method for recognizing an object image in 

45 accordance with the present invention as to whether a candidate for the predetermined object image is or is 
not the predetermined object image will be described hereinbelow. In this embodiment, by way of example, 
a face image is taken as the predetermined object image, and a judgment is made as to whether a 
candidate for the predetermined object image, which has been extracted, is or is not the face image. 

Figure 53 is an explanatory view showing an example of a neural network for making a judgment in an 

so embodiment of the second step of the method for recognizing an object image in accordance with the 
present invention as to whether a candidate for the predetermined object image is or is not the 
predetermined object image. In this embodiment, the neural network employs a neocognitron, which is 
unaffected by a shift in position, in order to make a judgment from feature parts of a candidate for the 
predetermined object image and the positions of the feature parts in the candidate for the predetermined 

55 object image as to whether the candidate for the predetermined object image is or is not the predetermined 
object image. 

In this embodiment, an image of a human face is employed as the predetermined object image. Also, a 
right eye pattern, a left eye pattern, and a mouth pattern are employed as a plurality of feature parts of the 
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predetermined object image. Learning operations for the right eye pattern, the left eye pattern and the 
mouth pattern are carried out with Kohonen's self-organization. The neocognitron is employed as the 

structure of the whole system. 

The neocognitron employed in this embodiment is constituted of three layers, i.e.. a Us1 -layer 95. a 
Ucl -layer 96 and a Us2-layer 98 which is the final output layer. An a1-mask 94 is located before the Us1- 
layer 95. In this embodiment, operations of the al-mask 94 for learning a plurality of patterns have been 
carried out with the Kohonen's self-organization with respect to a right eye pattern 91. a mouth pattern 92, 
and a left eye pattern 93. which are the feature parts of the face image. Therefore, judgments can be made 
as to various face images. Also, an a2-mask 97 is located before the Us2-layer 98 in order to investigate the 
positions of the right eye pattern 91. the mouth pattern 92, and the left eye pattern 93. which constitute the 
feature parts of a candidate 90 for the given face image. A judgment is thereby made as to whether the 
candidate 90 for the predetermined object image is or is not the face image. 

How the learning operations of the al-mask 94 are carried out with the Kohonen's self-organ.zat.on will 
be described hereinbelow. As an aid in facilitating the explanation, the learning operations are described 
below only for a closed eye pattern and an open eye pattern. 

When a closed eye pattern 100 and an open eye pattern 101 as illustrated in Figure 54 are given to the 
a1-mask 94 one after the other, elements (cells) which best match with the closed eye pattern 100 and the 
open eye pattern 101 are found from the elements (cells) constituting the a1-mask 94. Thereafter the 
degrees of matching between the best-matching elements (cells) and their topological neighbors (cells) are 
increased. In this manner, the closed eye pattern 100 and the open eye pattern 101 are allocated to the 
cells in the topological relationship. However, as illustrated in Figure 55. if the closed eye pattern 100 and 
the open eye pattern 101 are allocated to the cells in this manner, cells, which are located between the cells 
allocated with the closed eye pattern 100 and the cells allocated with the open eye pattern 101, will learn an 
eye pattern 102 which results from the superposition of the closed eye pattern 100 and the open eye 
pattern 101. In order for such a problem to be eliminated, as illustrated in Figure 56. when the cells which 
best match with the closed eye pattern 100 and the open eye pattern 101 are found from the cells 
constituting the a1-mask 94, and the degrees of matching between the best-matching cells and their 
topological neighbors are then increased, a pattern, which is intermediate between the closed eye pattern 
100 and the open eye pattern 101, i.e.. a half-open eye pattern 103, is allocated to the cells wh.ch are 

learninq the superposed eye pattern 102. 

in the manner described above, by the application of the closed eye pattern 100 and the open eye 
pattern 101 as the inputs, the a1-mask 94 can learn the half-open eye pattern 103 while its topolog.ca 
relationship is being kept. Also, the learning operations can be carried out in the same manner as that 
described above also for the mouth and left eye patterns. Additionally, by suitable selection of image 
patterns given during the learning operations, the learning operations can be carried out appropr.ately in 
accordance with a change in the facial expression and a change in the orientation of a face .mage. For 
example, a change in the facial expression can be learned by presenting typical eye and mouth patterns. A 
change in the orientation of a face image can be learned by presenting frontal and lateral face .mages. Also, 
a face image of a different person can be learned by presenting a typical face image. (In such cases, how 
to select typical feature patterns should be studied.) By the carrying out of the self-organ.zat.on, flexibility 
can be given at the stage of the learning operations to the feature patterns which are used for the judgment. 

How the respective layers of the neocognitron neural network employed in this embodiment work will be 
described hereinbelow. First, a part 94A of the a1-mask 94. which has learned right eye patterns carries out 
the convolution (i.e., matching with the patterns learned with the Kohonen's self-organization) for the » right 
eye pattern 91 in order to investigate whether the right eye pattern 91 is or is not present .n the candidate 
90 for the face image, which is given as a candidate for the predetermined object image. The part 94A of 
the a1-mask 94 has learned various right eye patterns, such as a closed right eye pattern, an open right eye 
pattern, and a half-open right eye pattern. Also, a part 94B of the a1-mask 94 has learned var.ous mouth 
patterns and a part 94C of the a1-mask 94 has learned various left eye patterns, such that judgments can 
be made for various face images. When the convolution carried out on the candidate 90 for the face image 
by the al-mask 94 reveals that the right eye pattern 91 is present in the candidate 90 for the face .mage a 
part 95a corresponding to the right eye pattern in cells 95A, which correspond to the right eye pattern 91 
and which are contained in the Us1 -layer 95. makes a response. Also, when it is found that the mouth 
pattern 92 is present in the candidate 90 for the face image, a part 95b corresponding to the mouth pattern 
in cells 95B which correspond to the mouth pattern 92 and which are contained in the Us 1 -layer 95 makes 
a response Additionally, when it is found that the left eye pattern 93 is present in the candidate 90 for the 
face image, a part 95c corresponding to the left eye pattern in cells 95C, which correspond to the left eye 
pattern 93 and which are contained in the Us1 -layer 95. makes a response. When the parts 95a. 95b. and 
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95c of the cells 95A, 95B, and 95C of the Us1 -layer 95 make responses in the manner described above, 
response outputs, which represent that the right eye pattern, the mouth pattern, and the left eye pattern are 
present in the candidate 90 for the face image, are fed from the Us1 -layer 95 into the Uc1 -layer 96. 

The Uc1 -layer 96 has synaptic connections such that, if at least a single responding cell is present in 

5 the part 95a, 95b, or 95c of the cells 95A, 95B, or 95C of the Us1 -layer 95, the cells 96A, 96B, or 96C, 
which correspond respectively to the right eye pattern 91, the mouth pattern 92, or the left eye pattern 93, 
may generate an output. Specifically, as for the right eye pattern, when the part 95a of the cells 95A of the 
Us 1 -layer 95, which serves as the input layer, makes a response to the right eye pattern, the cells 96A, 
which correspond to the right eye pattern and which are contained in the Uc1 -layer 96, generates a 

w response output. The response output is generated from a region 96a of the cells 96A, which region is 
larger than the region of the part 95a of the cells 95A of the Us1 -layer 95. Also, as for the mouth pattern 
and the left eye pattern, response outputs are generated in the same manner as that for the right eye 
pattern. Therefore, the response outputs are not much affected by a shift in the position of the input pattern. 
The response outputs of the Uc1 -layer 96 are fed into the a2-mask 97 and then into the Us2-layer 98 

is which serves as the final output layer. The a2-mask 97 is constituted of a mask 97A for a right eye pattern, 
a mask 97B for a mouth pattern, and a mask 97C for a left eye pattern. The masks 97A, 97B, and 97C carry 
out the convolution on the outputs from the corresponding cells 96A, 96B, and 96C of the Uc1 -layer 96 
while the relationship among the positions of the right eye pattern, the mouth pattern, and the left eye 
pattern is being adjusted. The a2-mask 97 connects the Uc1 -layer 96 and the Us2-layer 98 with each other 

20 through synaptic connections such that the Us2-layer 98 may respond only when the right eye pattern, the 
mouth pattern, and the left eye pattern are located at the same positions as those in a human face. Also, 
the a2-mask 97 not only investigates the condition that the right eye pattern, the mouth pattern, and the left 
eye pattern are detected by the Uct -layer 96 but also confirms the condition that no other features are 
detected by the Uc1 -layer 96. Therefore, the Us2-layer 98 responds only when the right eye pattern, the 

25 mouth pattern, and the left eye pattern are detected at the predetermined positions by the Uc1 -layer 96. As 
a result of the response of the Us2-layer 98, it is judged that the candidate 90 for the face image is the 
human face image. Conversely, when the right eye pattern, the mouth pattern, and the left eye pattern are 
not detected by the Uc1 -layer 96, or when the right eye pattern, the mouth pattern, and the left eye pattern 
are detected at positions different from the predetermined positions by the Uc1 -layer 96, the Us2-layer 98 

30 makes no response, and therefore it is judged that the candidate 90 for the face image is not the human 
face image. 

In the embodiment described above, the judgment as to whether the candidate for the predetermined 
object image is or is not the predetermined object image is carried out with the neocognitron in accordance 
with feature parts of the candidate for the predetermined object image and the positions of the feature parts. 

35 The neural network for this purpose is not limited to the neocognitron, and may be constituted of a 
cognitron or a perceptron. Also, a neural network need not necessarily be employed for making a judgment, 
and any of other processes may be employed for this purpose. 

Also, in the embodiment described above, the learning operations for feature patterns are carried out by 
utilizing the Kohonen's self-organization as the neural network. Alternatively, the learning operations for 

40 feature patterns may be carried out by utilizing any of other neural networks. As another alternative, any of 
learning means other than the neural network may be employed. 

Additionally, in the embodiment described above, the method for recognizing an object image in 
accordance with the present invention is used to recognize a human face image. However, the method for 
recognizing an object image in accordance with the present invention is applicable also when any of other 

45 object images is recognized. In such cases, if a neural network is utilized to make a judgment, and the 
learning operations are carried out in a manner suitable for the object image to be recognized, a judgment 
as to whether a candidate for the predetermined object image is or is not the predetermined object image 
can be made efficiently as in recognition of a human face image. 

Furthermore, in the aforesaid embodiment, the extraction of the candidate for the predetermined object 

so image in the first step of the method for recognizing an object image in accordance with the present 
invention is carried out with the neural network by moving the center point of the view window having the 
predetermined size to the position of the candidate for the predetermined object image and determining the 
extraction area in accordance with the size and/or the shape of the candidate for the predetermined object 
image, the center point of the view window being taken as a reference during the determination of the 

55 extraction area. Alternatively, a candidate for the predetermined object image may be extracted by creating 
a map of a potential field of the whole image and determining the extraction area in accordance with the 
size and/or the shape of the candidate for the predetermined object image, a minimum point of the potential 
in the map being taken as a reference during the determination of the extraction area. 
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As described above, the potential field represents the field of a point, at which a viewpoint of human 
eyes is stable. When the potential field is considered with respect to the aforesaid neural network for 
extracting the candidate for the predetermined object image, the potential M ein b, rega rded « .the held 
of a point, at which the center point of the view window is stable. Specially, the potential held can be 
regarded as being flat at the center point of the candidate for the predetermined object image, at wh.ch 
point the center point of the view window is stable. Also, the potential field can be regarded as being 
gradient at a point, at which the center point of the view window is unstable and from wh.ch the v.ew 
window is required to travel. In such cases, the thing that expresses the gradient of the potent.al field ,s the 
direction of the vector for the travel of the view window, which vector is employed .n the aforesa.d neural 
o network for the extraction. Therefore, the map of the potential field of the image can be created by scanning 
the whole area of the given image with the view window of the aforesaid neural network and recording the 
vectors for the travel of the view window, which vectors are found at respective points in the image, on the 
image as gradient vectors of the potential field. ^ tnrB th« 

Figure 57 is an explanatory view showing a map of a potential field created by using the vectors fo he 
, 5 travel of the view window, which have been found in the manner described above, as grad.ent vectors of the 
potential field and recording the gradient vectors of the potential field on the image^ 

With reference to Figure 57. an elliptic image 105. a triangular image 106. and a square , .mage 07 are 
embedded in an image 104. Gradient vectors of the potential field over the whole image 104 are calculated 
with the aforesaid neural network and are recorded at respective positions m the .mage 104 which 
20 correspond to the positions at which the gradient vectors of the potential held have been found In this 
manner a map of the potential field is obtained. In Figure 57, the gradient vectors of the potent.al field are 
directed to center points P1. P2. and P3 of the elliptic image 105. the triangular image 106, and the square 
fr^ 107. When L directions of the gradient vectors of the potential field are regarded as the gradients 
of the potential field, minimum points of the potential field can be found at the center po.nts P1, P2. and P3 
25 of the elliptic image 105, the triangular image 106. and the square image 107. 

Figure 58 is a graph showing the gradients of the potential field. In Figure 58, the image 104 shown m 
Figure 57 is scanned with a vtow window 73. and a gradient vector 75 of the potential field at a center point 
N of the view window 73 is being directed to the center point P3 of the square .mage 107 

A graph 108 at the upper part of Figure 58 is a two-dimensional graph of the potent.al field obtained by 
so finding the gradients of the potential field in the manner described above and taking a cross-sect.on of the 
DOtential field along the line l-l of the image 104. . , * AU * 

P tZ graph 108 has three minimum points PV. P2\ and P3". which correspond respectively to the center 
point P1 of the elliptic image 105. the center point P2 of the triangular image 106. and the center point P3 
of the square image 107. Also, a point N' in the graph 108 corresponds to the center point N of the view 
35 window on the image 104. in this manner, the minimum points of the potential field can be found by 
cSating the gradients of the potential fie.d of the whole image 104 and creating the map of the potent, a 
Lid Candidates for the predetermined object images are then extracted .n accordance with the s.zes 
and/or the shapes of the candidates for the predetermined object images by taking the mm.mum po.nts of 
The potential field as references. In this manner, the extraction of the candidates for the predeterm.ned 
40 object images can be carried out very efficiently. imQr ,- in 

A second embodiment of the first step for extracting a candidate for a predeterm.ned object image in 
the method for recognizing an object image in accordance with the present invention will be descr.bed 

her wUh 0 the second embodiment of the first step for extracting a candidate for a predetermined object 
45 image in the method for recognizing an object image in accordance with the present invention .n he 
embodiment wherein the vector for the travel of the view window is calculated from , the movemen of he 
candidate for the predetermined object image, the extracted contour lines of the candidate for the 
□redetermined object image, which candidate shows a movement different from the movemen of he 
background, are not subjected to transformation with the complex-log mapping, and a vector for the travel of 
so 5f5ew window is determined by using a neural network having a Cartesian plane-like neurorv a-ngement 
Also, in the embodiments wherein the vectors for the travel of the view window are calculated I from the 
contour line of the candidate for the predetermined object image and from the color of the candidate tor the 
predetermined object image, the image at the time t. which image has been cut out and which fa Is w.th n 
'he region inside of the view window having the predetermined size, is not subjected ^ transformation w th 
55 the comp.ex-.og mapping, and a vector for the travel of the view window ,s determined by using a neural 
network having a Cartesian plane-like neuron arrangement. „ ar ,HiH a u> for 

Figure 59 is an explanatory view showing an example of a neural network for extracting a candidate for 
a predetermined object image by moving the center point of a view window to the center point of the 
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candidate for the predetermined object image in the second embodiment of the first step in the method for 
recognizing an object image in accordance with the present invention. First, a neural network will be 
described hereinbelow which fetches an image falling within the region inside of the view window in order to 
calculate a vector for the travel of the view window from the contour lines of the candidate for the 
5 predetermined object image, which shows a movement different from the movement of the background, the 
lines being found in the embodiment for calculating the vector for the travel of the view window from the 
movement of the candidate for the predetermined object image, or from the contour line of the candidate for 
the predetermined object image. The neural network employed in the second embodiment of the first step 
in the method for recognizing an object image in accordance with the present invention is constituted of a 
/o plurality of neuron layers. Specifically, the neural network is constituted of an input layer 120 for cutting out 
an image part, which falls within the region inside of a view window having a predetermined size, from a 
given image 110. (In cases where a vector for the travel of the view window is calculated from the 
movement of the candidate for the predetermined object image, the signal representing the image 42 shown 
in Figure 36, which represents the contour lines of the candidate for the predetermined object image 
75 showing a movement different from the movement of the background, is fed into the input layer 120.) The 
neural network is also constituted of a contour line detecting layer 121 for detecting circular arc-like contour 
lines of a candidate for a predetermined object image from the image, which has been cut out (or which has 
been fed into the input layer 120), and a contour line detecting layer 121* for detecting radial contour lines 
of the candidate for the predetermined object image from the image, which has been cut out. The neural 
20 network is additionally constituted of an end point detecting layer 122 for detecting the end points of the 
radial contour lines, an interactive connection layer 123 for selecting a continuous circular arc-like contour 
line, and an interactive connection layer 123' for selecting a continuous radial contour line. The neural 
network is further constituted of a tilted component extracting layer 124 for extracting all of contour line 
components, which are. tilted at a predetermined angle with respect to contours of a group of concentric 
25 circles surrounding the center point of the view window, from the circular arc-like contour line, which has 
been selected, and a tilted component extracting layer 124' for extracting all of contour line components, 
which are tilted at a predetermined angle with respect to contours of a group of concentric circles 
surrounding the center point of the view window, from the radial contour line, which has been selected. The 
neural network is also constituted of an azimuth vector detecting layer 125 for detecting the azimuths and 
30 intensities, which represent azimuth vectors of the candidate for the predetermined object image with 
respect to the contours of the group of the concentric circles surrounding the center point of the view 
window, from the tilted components of the circular arc-like contour line, which have been extracted, and an 
azimuth vector detecting layer 125' for detecting the azimuths and intensities, which represent azimuth 
vectors of the candidate for the predetermined object image with respect to the contours of the group of the 
35 concentric circles surrounding the center point of the view window, from the tilted components of the radial 
contour line, which have been extracted. The neural network is additionally constituted of an azimuth vector 
composing layer 126 for composing an azimuth vector from the detected azimuths and intensities of the 
components of the circular arc-like contour line, and an azimuth vector composing layer 126' for composing 
an azimuth vector from the detected azimuths and intensities of the components of the radial contour line. 
40 The neural network further has a travel vector output layer 127 for determining a vector for the travel of the 
view wjndow from the azimuth vectors which have been composed. Specifically, the input layer 120 
corresponds to the a- or a'-layer of the neural network shown in Figure 4 or Figure 37 for determining the 
vector for the travel of the view window in the complex-log mapped plane. Also, the contour line detecting 
layers 121 and 121' correspond respectively to the d- or d '-layer and the c2- or c2'-layer of the neural 
45 network shown in Figure 4 or Figure 37. Additionally, the end point detecting layer 122 corresponds to the 
d- or d'-layer of the neural network shown in Figure 4 or Figure 37. Further, the interactive connection layers 
123 and 123' correspond respectively to the e1- or eT-layer and the e2- or e2'-layer of the neural network 
shown in Figure 4 or Figure 37. Also, the tilted component extracting layers 124 and 124 1 correspond 
respectively to the f1- or fV-layer and the f2- or f2 , -layer of the neural network shown in Figure 4 or Figure 
so 37. Additionally, the azimuth vector detecting layers 125 and 125' correspond respectively to the g1- or gV- 
layer and the g2- or g2*-layer of the neural network shown in Figure 4 or Figure 37. Further, the azimuth 
vector composing layers 126 and 126* correspond respectively to the hi- or hi '-layer and the h2-or h2'- 
layer of the neural network shown in Figure 4 or Figure 37, and the travel vector output layer 127 
corresponds to the i- or i'-layer of the neural network shown in Figure 4 or Figure 37. 
55 First, in the input layer 120, part of an image falling in the region inside of the view window having a 
predetermined size is cut out and fetched from the outer world. (Alternatively, the signal representing the 
image 42 shown in Figure 36 is fed into the input layer 120.) Information representing the image fetched by 
the input layer 120 is fed into the contour line detecting layers 121 and 12V, which detect circular arc-like 
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contour lines and radial contour lines. In cases where a candidate for the predetermined object image is 
blocked by a different body located on the foreground side, end points of the radial contour lines occur at 
the boundary between the candidate for the predetermined object image and the different body located on 
the foreground side. In the end point detecting layer 122. such end points of the radial contour lines are 
extracted from the pattern received from the contour line detecting layer 121\ in which the rad.al contour 
lines have been extracted. Thereafter, in the interactive connection layers 123 and 123', a continuous 
circular arc-like contour line and a continuous radial contour line are selected. In the tilted component 
extracting layers 124 and 124', all of the contour line components, which are tilted at a predetermined angle 
with respect to the contours of the group of the concentric circles surrounding the center po.nt of the view 
window, are extracted from the contour lines, which have been selected. 

The weights of the synaptic connections from the interactive connection layer 123 to the tilted 
component extracting layer 124 are designed such that the synaptic weights in tilted component extracting 
layers 124a and 124b may be distributed in the patterns shown in Figures 60A and 60B. A group of 
synaptic weights patterns 130 for detecting a contour line shown in Figure 60A are tilted upwards to the 
right at predetermined angles +<* and -a with respect to the circumferential directions of a group of 
concentric circles, which surround the center point of the view window. Also, a group of synapt.c weights 
patterns 131 for detecting a contour line shown in Figure 60B are tilted upwards to the left at predetermined 
angles +a and -a with respect to the circumferential directions of a group of concentr.c circles, which 
surround the center point of the view window. The group of the synaptic weights patterns 130 for detecting 
a contour line and the group of the synaptic weights patterns 131 for detecting a contour ne have sizes 
such that the range of the connections may become narrower at a region closer to the center point of the 
view window. The group of the synaptic weights patterns 130 for detecting a contour line and the group of 
the synaptic weights patterns 131 for detecting a contour line detect the components of the circular arc-l.ke 
contour line of the candidate for the predetermined object image, which are tilted at the predetermined 
angle with respect to the directions along the circumferences around the center point of the v.ew w.ndow. 
For example, in cases where a circular candidate 132 for the predetermined object image .s present at the 
position shown in Figure 61 A or Figure 61B, of the components of the contour line of the candidate 132 for 
the predetermined object image, which line has been selected by the interactive connect.on layerJ23, the 
component tilted upwards to the right is detected by a synaptic weights pattern 130a contained .n the group 
of the synaptic weights patterns 130 for detecting a contour line. Also, the component tilted upwards to the 
left is detected by a synaptic weights pattern 131a contained in the group of the synaptic weights patterns 
131 for detecting a contour line. From the azimuths and the intensities of the tilted components which have 
thus been detected, the azimuth and the intensity of the component tilted upwards to the right are detected 
in an azimuth vector detecting layer 125a. Also, the azimuth and the intensity of the components tilted 
upwards to the left are detected in an azimuth vector detecting layer 125b. Thereafter, in the az.muth vector 
composing layer 126, an azimuth vector is composed from the azimuths and the intens.t.es. wh.ch have 
been detected by the azimuth vector detecting layers 125a and 125b. Information represents the azimuth 
vector, which has thus been composed, is transmitted to the travel vector output layer 127 and fed out as 
the information representing the vector for the travel of the view window. 

The weights of the synaptic connections from the interactive connection layer 123 to the t.lted 
component extracting layer 124' are designed such that the synaptic weights in tilted component extract.ng 
layers 124a' and 124b' may be distributed in the patterns shown in Figures 62A and 62B A group o 
synaptic weights patterns 130' for detecting a contour line shown in Figure 62A are tilted upwards to the eft 
at predetermined angles + « and -a with respect to the circumferential directions of a group of concentric 
circles which surround the center point of the view window. Also, a group of synaptic weights patterns 131 
for detecting a contour line shown in Figure 62B are tilted upwards to the right at predetermined angles +« 
and -«' with respect to the circumferential directions of a group of concentric circles, which surround the 
center point of the view window. The group of the synaptic weights patterns 130' for detecting a contour line 
and the group of the synaptic weights patterns 131' for detecting a contour line have sizes such that the 
, range of the connections may become narrower at a region closer to the center point of the view window. 
The group of the synaptic weights patterns 130' for detecting a contour line and the group of the synaptic 
weights patterns 131' for detecting a contour line detect the components of the radial contour line of the 
candidate for the predetermined object image, which are tilted at the predetermined angle w.th respect to 
the directions along the circumferences around the center point of the view window. For example, m cases 
where a point of intersection of contour lines of a candidate 132' for the predetermined object .mage is 
present at the position shown in Figure 63A or Figure 63B, of the components of the contour lines of the 
candidate 132' for the predetermined object image, which lines have been selected by the interacts 
connection layer 123'. the component tilted upwards to the left is detected by a synaptic we.ghts pattern 
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130a* contained in the group of the synaptic weights patterns 130' for detecting a contour line. Also, the 
component tilted upwards to the right is detected by a synaptic weights pattern 131a' contained in the 
group of the synaptic weights patterns 13V for detecting a contour line. Of the azimuths and the intensities 
of the tilted components which have thus been detected, the azimuth and the intensity of the component 

5 tilted upwards to the left are detected in an azimuth vector detecting layer 125a'. Also, the azimuth and the 
intensity of the component tilted upwards to the right are detected in an azimuth vector detecting layer 
125b'. Thereafter, in the azimuth vector composing layer 126\ an azimuth vector is composed from the 
azimuths and the intensities, which have been detected by the azimuth vector detecting layers 125a* and 
125b'. Information representing the azimuth vector, which has thus been composed, is transmitted to the 

r 0 travel vector output layer 127 and fed out as the information representing the vector for the travel of the 
view window. 

The vectors for the travel of the view window, which have been composed in the azimuth vector 
composing layers 126 and 126', will be different from each other, depending on the circumstances under 
which the center point of the view window is placed. However, as in the aforesaid first embodiment of the 

t5 first step of the method for recognizing an object image in accordance with the present invention, when the 
outputs representing the two types of the vectors for the travel of the view window are utilized in a well- 
balanced manner in the travel vector output layer 127. the view window can be moved to a predetermined 
position regardless of the circumstances under which the center point of the view window is placed. 

At the same time as when the vector for the travel of the view window is calculated from the movement 

20 of the candidate for the predetermined object image and the vector for the travel of the view window is 
calculated from the contour line of the candidate for the predetermined object image, a vector for the travel 
of the view window is also calculated from the color of the candidate for the predetermined object image. 

Figure 64 is an explanatory view showing an example of a neural network for extracting a candidate for 
a predetermined object image by moving the center point of a view window to the center point of the 

25 candidate for the predetermined object image in this embodiment of the first step in the method for 
recognizing an object image in accordance with the present invention. The neural network shown in Figure 
64 is constituted of a plurality of neuron layers. Specifically, the neural network is constituted of an input 
layer 150 for cutting out an image part, which falls within the region of a view window having a 
predetermined size, from a given image 140. The neural network is also constituted of layers 151. 152, and 

30 153 for detecting regions approximately coinciding in color with the candidate for the predetermined object 
image in the cut-out image, and an interactive connection layer 154 for selecting a color region having a 
predetermined size and a predetermined intensity. The neural network is additionally constituted of an 
azimuth vector detecting layer 155 for detecting the azimuth and the distance of the selected color region, 
and a travel vector output layer 156 for composing a vector from the detected azimuth and the detected 

35 distance and generating an output representing a vector for the travel of the view window. 

Specifically, the input layer 150 corresponds to the a"-layer of the neural network shown in Figure 39 
for determining the vector for the travel of the view window in the complex-log mapped plane. Also, the 
layers 151, 152, and 153 correspond to the c". d"-. and e"-layers of the neural network shown in Figure 39. 
Additionally, the interactive connection layer 154 corresponds to the f"-layer of the neural network shown in 

40 Figure 39. Further, the azimuth vector detecting layer 155 corresponds to the g"-layer of the neural network 
shown in Figure 39. Also, the travel vector output layer 156 corresponds to the i"-layer of the neural network 
shown in Figure 39. 

First, in the input layer 150, part of an image falling in the region inside of the view window having a 
predetermined size is cut out and fetched from the outer world. Information representing the image fetched 

45 by the input layer 150 is fed into the layers 151, 152, and 153. which detect regions approximately 
coinciding in color with the candidate for the predetermined object image. By the layers 151. 152, and 153. 
the regions approximately coinciding in color with the candidate for the predetermined object image are 
detected, and the degrees of coincidence in color between the detected regions and the candidate for the 
predetermined object image are detected. Thereafter, in the interactive connection layer 154, groups of 

so neurons falling within the regions having high degrees of coincidence in color with the candidate for the 
predetermined object image are selected. Thereafter, a group of the neurons falling within the region, which 
has the shape and the size most appropriate for the candidate for the predetermined object image, are 
extracted from the selected groups of the neurons falling within the regions having high degrees of 
coincidence in color with the candidate for the predetermined object image. 

55 In the interactive connection layer 154, the weights of the interactive synaptic connections for selecting 
the color region having a predetermined size and a predetermined intensity are set as shown in Figure 65. 
As illustrated in Figure 65. a group of synaptic weights patterns 160 for detecting a group of the neurons 
exhibiting a high degree of coincidence in color with the candidate for the predetermined object image are 
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set such that the range of the connections may become narrower at a region closer to the center point of 
the view window. With the group of the synaptic weights patterns 160, a group of the neurons exhibiting a 
high degree of coincidence in color with the candidate for the predetermined object image are detected. 
Thereafter, an azimuth vector is composed from the azimuth and the distance, which have been detected 
s by the azimuth vector detecting layer 155. Information representing the azimuth vector, which has thus been 
composed, is transmitted to the travel vector output layer 156 and fed out as the information representing 
the vector for the travel of the view window. 

In the manner described above, the three types of the vectors for the travel of the view window are 
obtained from the neural network, wherein the vector for the travel of the view window is calculated from the 
o contour line of the candidate for the predetermined object image, the neural network, wherein the vector for 
the travel of the view window is calculated from the movement of the candidate for the predetermined 
object image and the neural network, wherein the vector for the travel of the view window is calculated from 
the color of the candidate for the predetermined object image. Thereafter, these three types of the vectors 
for the travel of the view window are combined with one another in a manner suitable for the candidate for 
ts the predetermined object image, which is to be found. In this manner, an ultimate vector for the travel of the 
view window is determined. The view window is caused to travel towards the candidate for the predeter- 
mined object image in accordance with the ultimately determined vector for the travel of the view window. 

The view window is caused to travel in the manner described above, and the center point of the view 
window reaches the center point of the candidate for the predetermined object image. Thereafter, as in the 
20 aforesaid embodiment of the first step of the method for recognizing an object image in accordance with the 
present invention, the candidate for the predetermined object image is normalized and cut out. Information 
representing the cut-out candidate for the predetermined object image is fed into the judgment means. The 
judgment means makes a judgment as to whether the candidate for the predetermined object image is or is 
not the predetermined object image. 
25 The vectors for the travel of the view window, which have been determined in the embodiment of the 
first step in the method in accordance with the present invention, wherein the vector for the travel of the 
view window is determined in the Cartesian plane, may be expressed as gradient vectors of a potential field, 
and a map of the potential field of the image may thus be created. Extraction of the candidate for the 
predetermined object image may be carried out by utilizing the potential field map. 
30 In the aforesaid embodiments of the method for recognizing an object image in accordance with the 
present invention, a candidate for the predetermined object image is extracted, and then a judgment is 
made as to whether the extracted candidate for the predetermined object image is or is not the 
predetermined object image. Alternatively, after a candidate for the predetermined object image has been 
extracted, the extracted candidate for the predetermined object image may be taken as a target object and 
35 utilized during the learning operations of the neural network. 

Figure 66 is a block diagram showing the fundamental concept behind the learning method for a neural 
network in accordance with the present invention. First, in a first step 1\ a target object image for learning 
operations is extracted from a given image. Thereafter, in a second step 2', learning operations of the neural 
network are carried out with respect to the target object image, which has been extracted in the first step V. 
40 The aforesaid neural network, which is utilized in the judgment step in the method for recognizing an 
object image in accordance with the present invention, may be taken as an example of the neural network, 
the learning operations of which are carried out in the second step 2\ Specifically, as illustrated in Figure 
53 the neural network, the learning operations of which are carried out with the learning method for a neural 
network in accordance with the present invention, employs the aforesaid neocognitron, which is unaffected 
45 by a shift in position. The learning operations are carried out in accordance with feature parts of the object 
image and the positions of the feature parts in the object image. The neural network employed in this 
embodiment is constituted of three layers, i.e., the Us1 -layer 95, the Uc1 -layer 96, and the Us2-layer 98 
which is the final output layer. The a1-mask 94 is located before the Usl-layer 95. In th.s embodiment, 
operations of parts 94A, 94B. and 94C of the a1-mask 94 are carried out for learning a plurality of patterns 
so with respect to the right eye pattern 91, the mouth pattern 92, and the left eye pattern 93, by utilizing the 
feature parts of the face image, which have been extracted in the first step V. As a result, operations for 
leaning various face images can be carried out. Also, the a2-mask 97 is located before the Us2-layer 98. 
With the a2-mask 97. operations for learning the positions of the right eye pattern 91, the mouth pattern 92. 
and the left eye pattern 93, which constitute the feature parts of a given object image 90, are carried out. 
55 The first step V of the learning method for a neural network in accordance with the present invention is 
the same as the step of the method for recognizing an object image in accordance with the present 
invention, in which step a candidate for the predetermined object image is extracted. 
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In the first step V of this embodiment, human face images as well as eye and mouth patterns, which 
constitute the feature parts of human face images, are extracted from images in which various human face 
images are embedded. The extracted eye and mouth patterns vary for different human face images. For 
example, the extracted eye. patterns include closed eye patterns, open eye patterns, and half-open eye 
patterns. Also, the extracted mouth patterns include open mouth patterns and closed mouth patterns. With 
the a1-mask 94 of the neocognitron shown in Figure 53, the learning operations can be carried by 
classifying these feature parts and effecting the self-organization. Specifically, the learning operations with 
the self-organization are carried out with the part 94A of the a1-mask 94 for the right eye patterns, with the 
part 94B for the mouth patterns, and with the part 94C for the left eye patterns. Also, with the a2-mask 97. 
the learning operations are carried out with respect to the relationship between the positions of the eye and 
mouth patterns in the whole face images. During the learning operations, in cases where the feature parts, 
which have been extracted in the first step 1* of this embodiment, are of the type included in the 
classification, information about the new feature parts can be added to the already existing information with 
respect to the classification. In this manner, a better neural network can be built up. In cases where the 
feature parts, which have been extracted in the first step V of this embodiment, are not of the type included 
in the classification, new parts for the feature parts are created in the a1-mask 94. In this manner, an a1- 
mask 94 having more finely classified parts can be obtained. In this manner, the learning operations of the 
neural network can be carried out easily and efficiently, and the performance of the neural network can 
thereby be improved even further. 

As explained above with reference to the method for recognizing an object image in accordance with 
the present invention, the neural network, the learning operations of which have been carried out in the 
manner described above, can be utilized to make a judgment as to whether a candidate for the 
predetermined object image is or is not the predetermined object image. 

With the aforesaid embodiment of the learning method for a neural network in accordance with the 
present invention, the neural network carries out its learning operations by itself with respect to the feature 
parts of human face images by effecting the self-organization. Alternatively, the extraction of the feature 
parts may be carried out automatically, and the learning operations may be carried out by manually 
teaching what the feature parts are to the neural network. 

In the aforesaid embodiments of the method for recognizing an object image and the learning method 
for a neural network in accordance with the present invention, neural networks are used to extract the 
candidate for the predetermined object- image or the target object image (hereinafter referred to as the 
candidate for the predetermined object image, or the like). The neural networks need not necessarily be 
employed, and any of other techniques may be utilized for this purpose. 

In the aforesaid embodiment, wherein the vector for the travel of the view window is calculated from the 
movement of the candidate for the predetermined object image, or the like, the vertical or horizontal 
movement of the image is detected in the Cartesian plane. Also, the movement in the radial or in-plane 
rotating direction is detected after the image is transformed with the complex-log mapping. However, 
transformation with the complex-log mapping need not necessarily be carried out during the detection of the 
image movement, and the image movement may be detected in the Cartesian plane. Also, the vertical or 
horizontal movement of the image may be detected after the image is transformed with the complex-log 
mapping. Additionally, in the aforesaid embodiment, wherein the vector for the travel of the view window is 
calculated from the movement of the candidate for the predetermined object image, or the like, the three 
types of image movements in the vertical or horizontal direction, in the radial direction, and in the in-plane 
rotating direction are detected. In cases where the image movement is constant (e.g., when the image 
moves only in the vertical or horizontal direction), only one type of movement (e.g., only the vertical or 
horizontal movement) may be detected. 

Also, in the aforesaid embodiments of the method for recognizing an object image and the learning 
method for a neural network in accordance with the present invention, neural networks are utilized for all 
steps from the cutting out of the image part falling within the region inside of the view window to the travel 
of the view window. However, the neural networks need not necessarily be utilized for all steps, and a neural 
network may be used in at least one step during the extraction of the candidate for the predetermined 
object image, or the like. Also, the cutting out of the candidate for the predetermined object image, or the 
like, may be carried out by using a neural network. 

Additionally, in the aforesaid embodiments of the method for recognizing an object image and the 
learning method for a neural network in accordance with the present invention, the neocognitron is utilized 
as the neural network, which carries out recognition in the method for recognizing an object image in 
accordance with the present invention, and as the neural network, the learning operations of which are 
carried out with the learning method for a neural network in accordance with the present invention. The 
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neural network used for these purposes is not limited to the neocognitron, and the cognitron or the 
perceptron may be employed for these purposes. 

Further, in the aforesaid embodiments of the method for recognizing an object image and the learning 
method for' a neural network in accordance with the present invention, the human face image or the yacht 
s image is extracted in the extraction step. The method for recognizing an object image and the learning 
method for a neural network in accordance with the present invention are also applicable when other types 
of predetermined object images are extracted. In such cases, any object image can be extracted efficiently 
in the same manner as that in the extraction of the human face image or the yacht image by using a neural 
network during the extraction and carrying out the learning operations of the neural network in a manner 
w suitable for the object image to be extracted. 

In the aforesaid embodiment of the learning method for a neural network in accordance with the present 
invention the learning operations are carried out for the neural network for recognizing an object image by 
making a judgment as to whether the object image is or is not a human face image. The learning method 
for a neural network in accordance with the present invention is also applicable when other target object 
,5 images are recognized. In such cases, the learning operations of the neural network can be carried out 
efficiently in the same manner as that in the case of the neural network, which recognizes the human face 
image, by using a neural network during the extraction of the target object image and carrying out the 
learning operations of the neural network in a manner suitable for the object image to be recognized. 

Also, in the aforesaid embodiment of the learning method for a neural network in accordance with the 
20 present invention, the learning operations are carried out for the neural network for recognizing an object 
image by making a judgment as to whether the object image is or is not a human face image. However, the 
neural network, the learning operations of which are carried out, is not limited to the neural network for 
making a judgment, and may be, for example, a neural network for recognizing what a given object image 
is In such cases, the learning operations can be carried out efficiently in the same manner as that in the 
25 neural network, which makes a judgment, by extracting an object image, which matches the object image to 
be recognized, from a given image, and carrying out the learning operations of the neural network with 
respect to the extracted object image. 

Additionally, the extraction step in the method for recognizing an object image and the learning method 
for a neural network in accordance with the present invention is applicable not only when a candidate for the 
30 predetermined object image, or the like, is extracted from a still image, in which a human face image, or the 
like is embedded, but also when a moving candidate for the predetermined object image, or the like, is 
followed up. In such cases, the view window may be caused to travel as in the aforesaid embodiment of the 
extraction of the candidate for the predetermined object image, or the like, and the moving candidate for the 
predetermined object image, or the like may thereby be followed up. Thereafter, the candidate for the 
35 predetermined object image, or the like, may be extracted, and a judgment may be made as to whether the 
moving candidate for the predetermined object image, or the like, is or is not the predetermined object 
image. 

Further, in the aforesaid embodiments of the method for recognizing an object image and the learning 
method for' a neural network in accordance with the present invention, the ultimate vector for the travel of 

40 the view window is determined by composing the three types of the vectors for the travel of the view 
window which have been calculated from the three types of features, i.e., the movement, the contour line, 
and the color of the candidate for the predetermined object image, or the like. Alternatively, the vector for 
the travel of the view window may be determined from the contour line and the color of the candidate for 
the predetermined object image, or the like. As another alternative, the vector for the travel of the view 

45 window may be determined from the movement and the contour line of the candidate for the predetermined 
object image, or the like. As a further alternative, the vector for the travel of the view window may be 
determined from the movement and the color of the candidate for the predetermined object image, or the 
like As a final alternative, the vector for the travel of the view window may be determined from only one of 
the movement, the contour line, and the color of the candidate for the predetermined object image, or the 

so like. 
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A method for recognizing an object image, which comprises the steps of: 

i) extracting a candidate for a predetermined object image from an image, and 

ii) making a judgment as to whether the extracted candidate for the predetermined object image is 
or is not the predetermined object image. 
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2. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) causing the center point of a view window, which has a predetermined size, to travel to the 
position of said candidate for the predetermined object image, and 
5 b) determining an extraction area in accordance with the size and/or the shape of said candidate for 

the predetermined object image, the center point of said view window being taken as a reference 
during the determination of said extraction area. 

3. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
w candidate for the predetermined object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) detecting a contour line of said candidate for the predetermined object image, which line extends 
in a predetermined direction, from said cut-out image, 

75 c) extracting all of components of said detected contour line, which are tilted at a predetermined 

angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 
d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as azimuth vectors, 

20 e) composing a vector from said azimuth vectors, a vector for a travel of said view window being 

thereby determined, 

f) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

g) determining an extraction area in accordance with the size and/or the shape of said candidate for 
25 the predetermined object image, the center point of said view window, which has thus been caused 

to travel, being taken as a reference during the determination of said extraction area. 

4. A method for recognizing an object image as defined in Claim 3 wherein the detection of said contour 
line of said candidate for the predetermined object image, which line extends in the predetermined 

30 direction, is carried out after said cut-out image has been transformed with complex-log mapping into a 

complex-log mapped image by taking the center point of said view window as a pole of mapping, 

the extraction of said components of said detected contour line is carried out by extracting all of 
contour line components, which are tilted at a predetermined angle with respect to an annular direction 
in the complex-log mapped plane, from the contour line, which has been detected in said complex-log 
35 mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
line components in said complex-log mapped plane. 

5. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
40 candidate for the predetermined object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out image, 

45 c) detecting an azimuth and a distance of said extracted region with respect to the center point of 

said view window, 

d) detecting said azimuth and said distance a vector for a travel of said view window, 

e) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

so f) determining an extraction area in accordance with the size and/or the shape of said candidate for 

the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 

6. A method for recognizing an object image as defined in Claim 5 wherein at least a single step selected 
55 from the group consisting of the extraction of said region, the detection said azimuth and said distance, 

and the detection of said vector for the travel of said view window is carried out on a complex-log 
mapped plane in which the center point of said view window is taken as a pole of mapping. 
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7. A method for recognizing an object image as defined in Claim 5 wherein, of a plurality of regions 
approximately coinciding in color with said candidate for the predetermined object image, regions, 
which exhibit a high degree of coincidence in cotor with said candidate for the predetermined object 
image and are located at positions close to one another, are caused to cooperate with each other and 

s are thereby emphasized, 

a region, which exhibits a high degree of coincidence in color with said candidate for the 
predetermined object image, and a region, which exhibits a low degree of coincidence in color with said 
candidate for the predetermined object image and is located at a position spaced apart from said 
region exhibiting a high degree of coincidence in color with said candidate for the predetermined object 
to image, are caused to compete with each other, said region, which exhibits a low degree of coincidence 

in color with said candidate for the predetermined object image, being thereby erased, 

regions, which exhibit a high degree of coincidence in color with said candidate for the predeter- 
mined object image and are located at positions spaced apart from each other, are caused to compete 
with each other, a region exhibiting a high degree of coincidence in color with said candidate for the 
is predetermined object image, which region has a size and a shape appropriate for the region to be 

selected, is kept unerased, whereas a region exhibiting a high degree of coincidence in color with said 
candidate for the predetermined object image, which region has a size and a shape inappropriate for 
the region to be selected, is erased, 

whereby a region, which is most appropriate in the region inside of said view window, is selected 
20 as a predetermined object image region, and 

an azimuth and a distance of said selected object image region are detected with respect to the 
center point of said view window. 

8. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
25 candidate for the predetermined object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, 

detecting contour lines of object images, which are embedded in the plurality of said cut-out 
30 images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

b) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
35 images, said contour lines extending in a radial direction with respect to the center point of said view 

window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
40 view window, the movement being detected from said calculated difference, 

c) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 
45 detecting a movement of said image in the radial direction in the region inside of said view 

window, the movement being detected from said calculated difference. 

d) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 

so plurality of images, in which the components of the movement of the background have been 

compensated for, being thereby obtained, 

e) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

55 f) extracting all of components of said detected contour line of said object showing a movement 

different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
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movement of the background, 

g) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
azimuth vectors, 

h) composing a vector from said azimuth vectors, a vector for a travel of said view window being 
thereby determined, 

i) causing the center point of said view window to travel in a direction heading towards said object in 
accordance with said vector for the travel of said view window, and 

j) determining an extraction area, from which the candidate for the predetermined object image 
showing a movement with respect to the background is to be extracted, in accordance with the size 
and/or the shape of said object, the center point of said view window, which has thus been caused to 
travel, being taken as a reference during the determination of said extraction area. 

A method for recognizing an object image as defined in Claim 8 wherein the detection of said contour 
lines of said object images, which are embedded in the plurality of said cut-out images, said contour 
lines extending in the radial direction with respect to the center point of said view window, and/or the 
detection of said contour lines of said object images, which are embedded in the plurality of said cut- 
out images, said contour lines extending in the annular direction, are carried out after said cut-out 
20 images are transformed with complex-log mapping into complex-log mapped images by taking the 

center point of said view window as a pole of mapping. 

10. A method for recognizing an object image as defined in Claim 8 wherein the detection of said contour 
line of said object, which shows the movement different from the movement of the background, ts 
carried out after an image representing said difference between the plurality of said images, in which 
the components of the movement of the background have been compensated for, has been trans- 
formed with complex-log mapping into a complex-log mapped image by taking the center point of said 
view window as a pole of mapping, 

the extraction of said components of said detected contour line of said object is carried out by 
extracting all of contour line components, which are tilted at a predetermined angle with respect to an 
annular direction in the complex-log mapped plane, from the contour line, which has been detected in 
said complex-log mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
line components in said complex-log mapped plane. 
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11. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 
size, from said image, 

40 b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
45 image, and said contour line, which has been detected from said second image. 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
so ground, as said candidate for the predetermined object image, 

i) detecting a vector directed towards said candidate for the predetermined object image as a vector 
for a travel of the view window, 

j) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

k) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 
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2. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) detecting a contour line of said candidate for the predetermined object image, which line extends 
in a predetermined direction, from said cut-out image, 

c) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 

d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

e) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

f) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out image, 

g) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

h) detecting said azimuth and said distance as a second travel vector, 

i) composing a vector from said first and second travel vectors, a vector for a travel of said view 
window being thereby determined, 

j) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

k) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 

13. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 
image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said candidate for the predetermined object image, which line 
extends in a predetermined direction, from one of the plurality of said cut-out images, 

b) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 

detecting a movement of said image in the radial direction in the region inside of said view 
window, the movement being detected from said calculated difference, 
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h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 
plurality of images, in which the components of the movement of the background have been 
compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

j) extracting all of components of said detected contour line of said object showing a movement 
different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 
determined, 

m) composing a vector from said first and second travel vectors, a vector for a travel of said view 
window being thereby determined, 

n) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

o) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 

A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 
image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said candidate for the predetermined object image, which line 
extends in a predetermined direction, from one of the plurality of said cut-out images, 

b) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 

detecting a movement of said image in the radial direction in the region inside of said view 
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window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 
plurality of images, in which the components of the movement of the background have been 
compensated for. being thereby obtained. 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, wh.ch shows 
a movement different from the movement of the background, being thereby detected, 

j) extracting all of components of said detected contour line of said object showing a movement 
different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 

determined, , 
m) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from one of the plurality of said cut-out images, 

n) detecting an azimuth and a distance of said extracted region with respect to the center po.nt of 
said view window, 

o) detecting said azimuth and said distance as a third travel vector. 

p) composing a vector from said first, second, and third travel vectors, a vector for a travel of said 
view window being thereby determined, 

q) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

r) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 

A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by. 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 
size, from said image, 

b) detecting a contour line of an object, which is embedded in said cut-out first image. 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image. 

e) calculating the difference between said contour line, which has been detected from sa.d first 
image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, wh.ch shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
ground, as said candidate for the predetermined object image, 

i) detecting a vector directed towards said candidate for the predetermined object image as a first 

travel vector, . ^ 

j) detecting a contour line of said candidate for the predetermined object image, wh.ch line extends 
in a predetermined direction, from said cut-out first image, 

k) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of sa.d 
view window, from said detected contour line of said candidate for the predetermined object image. 
I) detecting azimuths and intensities of said extracted components with respect to the center point of 
said view window, the azimuths and the intensities being detected as second azimuth vectors. 
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m) composing a vector from said second azimuth vectors, a second travel vector being thereby 
determined, 

n) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out first image, 

o) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

p) detecting said azimuth and said distance as a third travel vector, 

q) composing a vector from said first, second, and third travel vectors, a vector for a travel of said 
view window being thereby determined, 

r) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

s) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, the center point of said view window, which has thus been caused 
to travel, being taken as a reference during the determination of said extraction area. 

16. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) creating a map of a potential field of the whole area of said image, and 

b) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

17. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) detecting a contour line of said candidate for the predetermined object image, which line extends 
in a predetermined direction, from said cut-out image, 

c) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 

d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as azimuth vectors, 

e) composing a vector from said azimuth vectors, the composed vector being taken as a gradient 
vector of a potential field in a Cartesian plane having its origin at the center point of said view 
window, 

f) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

g) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 

h) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

18. A method for recognizing an object image as defined in Claim 17 wherein the detection of said contour 
line of said candidate for the predetermined object image, which line extends in the predetermined 
direction, is carried out after said cut-out image has been transformed with complex-log mapping into a 
complex-log mapped image by taking the center point of said view window as a pole of mapping, 

the extraction of said components of said detected contour line is carried out by extracting all of 
contour line components, which are tilted at a predetermined angle with respect to an annular direction 
in the complex-log mapped plane, from the contour line, which has been detected in said complex-log 
mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
line components in said complex-log mapped plane. 

19. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 



48 



EP 0 532 053 A2 



a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out image, 

5 c) detecting an azimuth and a distance of said extracted region with respect to the center point of 

said view window, 

d) detecting said azimuth and said distance as a gradient vector of a potential field in a Cartesian 
plane having its origin at the center point of said view window, 

e) scanning the whole area of said image with said view window, thereby calculating the gradient 
w vectors of the potential field with respect to the whole area of said image. 

f) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 

g) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 

75 reference during the determination of said extraction area. 

20. A method for recognizing an object image as defined in Claim 19 wherein at least a single step 
selected from the group consisting of the extraction of said region, the detection said azimuth and said 
distance, and the detection of said vector for the travel of said view window is carried out on a 

20 complex-log mapped plane in which the center point of said view window is taken as a pole of 

mapping. 

21. A method for recognizing an object image as defined in Claim 19 wherein, of a plurality of regions 
approximately coinciding in color with said candidate for the predetermined object image, regions, 

25 which exhibit a high degree of coincidence in color with said candidate for the predetermined object 

image and are located at positions close to one another, are caused to cooperate with each other and 
are thereby emphasized, 

a region, which exhibits a high degree of coincidence in color with said candidate for the 
predetermined object image, and a region, which exhibits a low degree of coincidence in color with said 
30 candidate for the predetermined object image and is located at a position spaced apart from said 

region exhibiting a high degree of coincidence in color with said candidate for the predetermined object 
image, are caused to compete with each other, said region, which exhibits a low degree of coincidence 
in color with said candidate for the predetermined object image, being thereby erased, 

regions, which exhibit a high degree of coincidence in color with said candidate for the predeter- 
35 mined object image and are located at positions spaced apart from each other, are caused to compete 

with each other, a region exhibiting a high degree of coincidence in color with said candidate for the 
predetermined object image, which region has a size and a shape appropriate for the region to be 
selected, is kept unerased, whereas a region exhibiting a high degree of coincidence in color with said 
candidate for the predetermined object image, which region has a size and a shape inappropriate for 
40 the region to be selected, is erased, 

whereby a region, which is most appropriate in the region inside of said view window, is selected 
as a predetermined object image region, and 

an azimuth and a distance of said selected object image region are detected with respect to the 
center point of said view window. 

22. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 
image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween. 

detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

b) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
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window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
5 view window, the movement being detected from said calculated difference, 

c) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 
w detecting a movement of said image in the radial direction in the region inside of said view 

window, the movement being detected from said calculated difference, 

d) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 

75 plurality of images, in which the components of the movement of the background have been 

compensated for, being thereby obtained, 

e) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

20 f) extracting all of components of said detected contour line of said object showing a movement 

different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

25 g) detecting azimuths and intensities of said extracted components of said detected contour line of 

said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
azimuth vectors, 

h) composing a vector from said azimuth vectors, the composed vector being taken as a gradient 
30 vector of a potential field in a Cartesian plane having its origin at the center point of said view 

window, 

i) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

j) creating a map of the potential field of the whole area of said image from the gradient vectors of 
35 the potential field, which have been calculated with respect to the whole area of said image, and 

k) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

40 23. A method for recognizing an object image as defined in Claim 22 wherein the detection of said contour 
lines of said object images, which are embedded in the plurality of said cut-out images, said contour 
lines extending in the radial direction with respect to the center point of said view window, and/or the 
detection of said contour lines of said object images, which are embedded in the plurality of said cut- 
out images, said contour lines extending in the annular direction, are carried out after said cut-out 

45 images are transformed with complex-log mapping into complex-log mapped images by taking the 

center point of said view window as a pole of mapping. 

24. A method for recognizing an object image as defined in Claim 22 wherein the detection of said contour 
line of said object, which shows the movement different from the movement of the background, is 
so carried out after an image representing said difference between the plurality of said images, in which 
the components of the movement of the background have been compensated for, has been trans- 
formed with complex-log mapping into a complex-log mapped image by taking the center point of said 
view window as a pole of mapping, 

the extraction of said components of said detected contour line of said object is carried out by 
55 extracting all of contour line components, which are tilted at a predetermined angle with respect to an 

annular direction in the complex-log mapped plane, from the contour line, which has been detected in 
said complex-log mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
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line components in said complex-log mapped plane. 

25. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 
5 a) cutting out a first image, which falls in a region inside of a view window having a predetermined 

size, from said image, 

b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

w d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
75 a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
ground, as said candidate for the predetermined object image, 

i) detecting a vector directed towards said candidate for the predetermined object image as a 
gradient vector of a potential field in a Cartesian plane having its origin at the center point of said 

20 view window, 

j) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

k) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
25 |) determining an extraction area in accordance with the size and/or the shape of said candidate for 

the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

26. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
30 candidate for the predetermined object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) detecting a contour line of said candidate for the predetermined object image, which line extends 
in a predetermined direction, from said cut-out image, 

35 c) extracting all of components of said detected contour line, which are tilted at a predetermined 

angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image, 
d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

40 e) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

f) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out image, 

g) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

45 h) detecting said azimuth and said distance as a second travel vector, 

i) composing a vector from said first and second travel vectors, the composed vector being taken as 
a gradient vector of a potential field in a Cartesian plane having its origin at the center point of said 
view window, 

j) scanning the whole area of said image with said view window, thereby calculating the gradient 
so vectors of the potential field with respect to the whole area of said image, 

k) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
I) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
55 reference during the determination of said extraction area. 

27. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 
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a) setting a view window, which has a predetermined size, on said image, said image being an 
image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 
5 detecting a contour line of said candidate for the predetermined object image, which line 

extends in a predetermined direction, from one of the plurality of said cut-out images, 

b) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said candidate for the predetermined object image. 

w c) detecting azimuths and intensities of said extracted components with respect to the center point 

of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

, 5 calculating the difference between images, which represent said detected contour lines, and 

detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 

20 window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 
25 g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 

images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and t 
detecting a movement of said image in the radial direction in the region inside of said view 
30 window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 
plurality of images, in which the components of the movement of the background have been 

35 compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

j) extracting all of components of said detected contour line of said object showing a movement 
40 different from the movement of the background, which are tilted at a predetermined angle with 

respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line ot 
45 said object, which shows the movement different from the movement of the background, with 

respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 

determined, . . , . _ 

50 m) composing a vector from said first and second travel vectors, the composed vector being taken 

as a gradient vector of a potential field in a Cartesian plane having its origin at the center point of 
said view window, 

n) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 
55 o) creating a map of the potential field of the whole area of said image from the gradient vectors of 

the potential field, which have been calculated with respect to the whole area of said image, and 
p) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
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reference during the determination of said extraction area. 

A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) i«ing . view window, which has a predefined size, on said image, said image be,ng an 

"TuS Sa a ;°rS e oTin,a g es. which ,a„ in a region inside o, said view window. „ a plurality 

nf times havina a predetermined time difference therebetween, and 

detec'ng a contour .ine of said candidate for the predetermined object image, which hne 
extends in a predetermined direction, from one of the plurality of said cut-out images. 
bfextracJnq all of components of said detected contour line, which are tilted at a predetermmed 
angle Srespec to contours of a group of concentric circles surrounding the center l™tof«.d 
Tew w'dow fmm said detected contour line of said candidate for the f™^^*%\ JJ 8 ^ 
c detecting azimuths and intensities of said extracted components w.th respect to the center po.nt 
of said v ew window, the azimuths and the intensities being detected as first az.muth vectors, 
d composlg a vec or from said first azimuth vectors, a first travel vector being thereby determ ned 
e! deTecting contour iines of object images, which are embedded in the plural.* of sa,d cut-out 

'^calculating the difference between images, which represent said detected contour lines^and 

detect a movement of said image in an in-plane parallel direction in the reg.on made of sa,d 
view window, the movement being detected from said calculated difference, 

? Retina contour lines of object images, which are embedded in the plurality of said cut-out 
IrnJer^^rTr Zl extending in ."radial direction with respect to the center point of said view 

^"calculating the difference between images, which represent said detected contour lines extend- 

de^Ta iTementt said image in an in-plane rotating direction in the region inside of said 
view window the movement being detected from said calculated difference 

g) derating contour lines of said object images, which are embedded ,n the plurahty of said cut-out 
images, said contour lines extending in an annular direction. extend- 
calculating the difference between images, which represent sa.d detected contour lines extend 

" 9 'S^a^rSSSl^-UI image in the radial direction in the region inside of said view 
window, the movement being detected from said calculated difference 

S 0 — g £ 5S££^2£T£ Plurality o. said images, in which ,he component, of the 

ZSSSSn uTS"" o, said extracted components o, ^ d-~-^~- 
said obiect which shows Ihe movement different from the movement of the background, w.tn 
SspecX 'no center point of s,id view window, the azimuths and the intensities being detected ,s 

nompS^'aC-'from said second a— vectors, a second trave, vector being thereby 

mtetactg a region, which approximately coincides in coior with said candidate for ,h. predeter- 
mined obiect image, from one of the plurality ol said cut-out .mages 

"detecting an azimuth and a distance of said extracted region w,th respect to the center pent 

said view window, 

o) detecting said azimuth and said distance as a third travel vector, 
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p) composing a vector from said first, second, and third travel vectors, the composed vector being 
taken as a gradient vector of a potential field in a Cartesian plane having its origin at the center point 
of said view window, 

q) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

r) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
s) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

29. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
candidate for the predetermined object image is carried out by: 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 
size, from said image, 

b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
image, and said contour line, which has been detected from said second image. 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
ground, as said candidate for the predetermined object image, 

i) detecting a vector directed towards said candidate for the predetermined object image as a first 
travel vector, 

j) detecting a contour line of said candidate for the predetermined object image, which line extends 
in a predetermined direction, from said cut-out first image, 

k) extracting all of components of said detected contour line, which are tilted at a predetermined 

angle with respect to contours of a group of concentric circles surrounding the center point of said 

view window, from said detected contour line of said candidate for the predetermined object image, 

I) detecting azimuths and intensities of said extracted components with respect to the center point of 

said view window, the azimuths and the intensities being detected as second azimuth vectors, 

m) composing a vector from said second azimuth vectors, a second travel vector being thereby 

determined, 

n) extracting a region, which approximately coincides in color with said candidate for the predeter- 
mined object image, from said cut-out first image, 

o) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

p) detecting said azimuth and said distance as a third travel vector, 

q) composing a vector from said first, second, and third travel vectors, the composed vector being 
taken as a gradient vector of a potential field in a Cartesian plane having its origin at the center point 
of said view window, 

r) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

s) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
t) determining an extraction area in accordance with the size and/or the shape of said candidate for 
the predetermined object image, a minimum point of the potential in said map being taken as a 
reference during the determination of said extraction area. 

30. A method for recognizing an object image as defined in any of Claims 1 through 29 wherein said 
judgment as to whether the extracted candidate for the predetermined object image is or is not the 
predetermined object image is made from feature parts of said candidate for the predetermined object 
image and positions of said feature parts in said candidate for the predetermined object image. 
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31. A method for recognizing an object image as defined in Claim 1 wherein said judgment as to whether 
the extracted candidate for the predetermined object image is or is not the predetermined object image 
is made by: 

a) causing a learning means to learn a plurality of feature patterns with respect to each of a plurality 
5 of feature parts of said predetermined object image, 

b) making judgments as to whether feature parts of said candidate for the predetermined object 
image are or are not included in the plurality of said feature patterns with respect to each of the 
plurality of said feature parts of said predetermined object image, which feature patterns said 
learning means has learned, and 

w c) making a judgment as to whether the relationship between the positions of said feature parts of 

said candidate for the predetermined object image coincides or does not coincide with the 
relationship between the positions of said feature parts of said predetermined object image, thereby 
judging whether said candidate for the predetermined object image is or is not said predetermined 
object image. 

75 32. A method for recognizing an object image as defined in Claim 31 wherein a neural network is 
employed in said learning means, and Kohonen's self-organization is employed for learning operations 
of said neural network. 

20 33. A method for recognizing an object image as defined in Claim 31 wherein a neural network is 
employed in order to make the judgments as to whether the feature parts of said candidate for the 
predetermined object image are or are not included in the plurality of said feature patterns, and/or the 
judgment as to whether the relationship between the positions of said feature parts of said candidate for 
the predetermined object image coincides or does not coincide with the relationship between the 

25 positions of said feature parts of said predetermined object image. 

34. A method for recognizing an object image as defined in Claim 33 wherein a neocognitron is employed 
in order to make the judgments or the judgment with said neural network for the judgments or the 
judgment. 



30 



35. A method for recognizing an object image as defined in Claim 30 wherein said predetermined object 
image is a face image, and the plurality of said feature parts are a right eye pattern, a left eye pattern, 
and a mouth pattern. 

35 36. A learning method for a neural network, which comprises the steps of: 

i) extracting a target object image, for which learning operations are to be carried out, from an 
image. 

ii) feeding a signal, which represents the extracted target object image, into a neural network, and 

iii) carrying out the learning operations of said neural network in accordance with said input target 
40 object image. 

37. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image, for which learning operations are to be carried out, is carried out by: 

a) causing the center point of a view window, which has a predetermined size, to travel to the 
45 position of said target object image, and 

b) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window being taken as a reference during the determination of 
said extraction area. 

so 38. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target, 
object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 

from said image, 

b) detecting a contour line of said target object image, which line extends in a predetermined 
55 direction, from said cut-out image, 

c) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 
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d) detecting azimuths and intensities of said extracted components with respect to the center point 
- • of said view window, the azimuths and the intensities being detected as azimuth vectors, 

e) composing a vector from said azimuth vectors, a vector for the travel of said view window being 
thereby determined, 

5 f) causing the center point of said view window to travel in accordance with said vector for the travel 

of said view window, and 

g) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 
reference during the determination of said extraction area. 

70 

39. A learning method for a neural network as defined in Claim 38 wherein the detection of said contour 
line of said target object image, which line extends in the predetermined direction, is carried out after 
said cut-out image has been transformed with complex-log mapping into a complex-log mapped image 
by taking the center point of said view window as a pole of mapping, 

, 5 the extraction of said components of said detected contour line is carried out by extracting all of 

contour line components, which are tilted at a predetermined angle with respect to an annular direction 
in the complex-log mapped plane, from the contour line, which has been detected in said complex-log 
mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
20 line components in said complex-log mapped plane. 

40. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
25 from said image, 

b) extracting a region, which approximately coincides in color with said target object image, from 
said cut-out image, 

c) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

30 d) detecting said azimuth and said distance a vector for a travel of said view window, 

e) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

f) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 

35 reference during the determination of said extraction area. 

41. A learning method for a neural network as defined in Claim 40 wherein at least a single step selected 
from the group consisting of the extraction of said region, the detection said azimuth and said distance, 
and the detection of said vector for the travel of said view window is carried out on a complex-log 

40 mapped plane in which the center point of said view window is taken as a pole of mapping. 

42. A learning method for a neural network as defined in Claim 40 wherein, of a plurality of regions 
approximately coinciding in color with said target object image, regions, which exhibit a high degree of 
coincidence in color with said target object image and are located at positions close to one another, are 

45 caused to cooperate with each other and are thereby emphasized, 

a region, which exhibits a high degree of coincidence in color with said target object image, and a 
region, which exhibits a low degree of coincidence in color with said target object image and is located 
at a position spaced apart from said region exhibiting a high degree of coincidence in color with said 
target object image, are caused to compete with each other, said region, which exhibits a low degree of 
so coincidence in color with said target object image, being thereby erased, 

regions, which exhibit a high degree of coincidence in color with said target object image and are 
located at positions spaced apart from each other, are caused to compete with each other, a region 
exhibiting a high degree of coincidence in color with said target object image, which region has a size 
and a shape appropriate for the region to be selected, is kept unerased, whereas a region exhibiting a 
55 high degree of coincidence in color with said target object image, which region has a size and a shape 

inappropriate for the region to be selected, is erased, 

whereby a region, which is most appropriate in the region inside of said view window, is selected 
as a target object image region, and 
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an azimuth and a distance of said selected object image region are detected with respect to the 
center point of said view window. 

43. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
5 object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, 

detecting contour lines of object images, which are embedded in the plurality of said cut-out 

w images, . 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region ins.de of said 
view window, the movement being detected from said calculated difference, 

b) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
/5 images, said contour lines extending in a radial direction with respect to the center point of sa.d v,ew 

window, . 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and . 

detecting a movement of said image in an in-plane rotating direction in the reg.on ins.de of sa.d 
20 view window, the movement being detected from said calculated difference. 

c) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images said contour lines extending in an annular direction. 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 
25 detecting a movement of said image in the radial direction in the region .ns.de of sa.d view 

window, the movement being detected from said calculated difference, 

d) compensating for components of a movement of a background in said cut-out images, wh.ch fall 
in the region inside of said view window, in accordance with said detected movement of said .mage 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 

30 plurality of images, in which the components of the movement of the background have been 

compensated for. being thereby obtained, 

e) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

3S f) extracting all of components of said detected contour line of said object showing a movement 

different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of sa.d view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background. 

4 0 q) detecting azimuths and intensities of said extracted components of said detected contour line of 

said object which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 

azimuth vectors, ... . , , . 

h) composing a vector from said azimuth vectors, a vector for a travel of said view window be.ng 

45 thereby determined, . 

i) causing the center point of said view window to travel in a direction heading towards said object in 
accordance with said vector for the travel of said view window, and 

i) determining an extraction area, from which the target object image showing a movement with 
respect to the background is to be extracted, in accordance with the size and/or the shape of said 
so object, the center point of said view window, which has thus been caused to travel, be.ng taken as a 

reference during the determination of said extraction area. 

44 A learning method for a neural network as defined in Claim 43 wherein the detection of said contour 
" lines of said object images, which are embedded in the plurality of said cut-out images, sa.d contour 
55 lines extending in the radial direction with respect to the center point of said view window, and/or the 

detection of said contour lines of said object images, which are embedded in the plurality of said cut- 
out images said contour lines extending in the annular direction, are carried out after sa.d cut-out 
images are transformed with complex-log mapping into complex-log mapped .mages by taking the 
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center point of said view window as a pole of mapping. 

45. A learning method for a neural network as defined in Claim 43 wherein the detection of said contour 
line of said object, which shows the movement different from the movement of the background, is 
carried out after an image representing said difference between the plurality of said images, in which 
the components of the movement of the background have been compensated for, has been trans- 
formed with complex-log mapping into a complex-log mapped image by taking the center point of said 
view window as a pole of mapping, 

the extraction of said components of said detected contour line of said object is carried out by 
extracting all of contour line components, which are tilted at a predetermined angle with respect to an 
annular direction in the complex-log mapped plane, from the contour line, which has been detected in 
said complex-log mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
line components in said complex-log mapped plane. 

46. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 

size, from said image, 

20 b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
30 ground, as said target object image, 

i) detecting a vector directed towards said target object image as a vector for a travel of the view 

window, 

j) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

k) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 
reference during the determination of said extraction area. 

47. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
40 object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size. 

from said image, 

b) detecting a contour line of said target object image, which line extends in a predetermined 
direction, from said cut-out image, 

c) extracting , all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

e) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

f) extracting a region, which approximately coincides in color with said target object image, from said 
cut-out image, 

g) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

55 h) detecting said azimuth and said distance as a second travel vector, 

i) composing a vector from said first and second travel vectors, a vector for a travel of said view 
window being thereby determined, 
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j) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

k) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 
5 reference during the determination of said extraction area. 

48. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 

io image including a movement. 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said target object image, which line extends in a predetermined 
direction, from one of the plurality of said cut-out images, 
75 b) extracting all of components of said detected contour line, which are tilted at a predetermined 

angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 
20 d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
25 view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 

window, , 

calculating the difference between images, which represent said detected contour lines extend- 

30 ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window the movement being detected from said calculated difference, 

g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 

detecting a movement of said image in the radial direction in the region inside of said v,ew 
window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 
plurality of images, in which the components of the movement of the background have been 
compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for. a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 
j) extracting all of components of said detected contour line of said object showing a movement 
different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
55 second azimuth vectors. 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 

determined, 
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m) composing a vector from said first and second travel vectors, a vector for a travel of said view 
window being thereby determined, 

n) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

o) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 
reference during the determination of said extraction area. 

A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 
image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said target object image, which tine extends in a predetermined 
direction, from one of the plurality of said cut-out images, 

b) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 

detecting a movement of said image in the radial direction in the region inside of said view 
window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 
plurality of images, in which the components of the movement of the background have been 
compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

j) extracting all of components of said detected contour line of said object showing a movement 
different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 
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I) composing a vector from said second azimuth vectors, a second travel vector being thereby 
determined, 

m) extracting a region, which approximately coincides in color with said target object image, from 
one of the plurality of said cut-out images, 
s n) detecting an azimuth and a distance of said extracted region with respect to the center point of 

said view window, 

0) detecting said azimuth and said distance as a third travel vector, 

p) composing a vector from said first, second, and third travel vectors, a vector for a travel of said 
view window being thereby determined. 
w q) causing the center point of said view window to travel in accordance with said vector for the travel 

of said view window, and 

r) determining an extraction area in accordance with the size and/or the shape of said target object 
image, the center point of said view window, which has thus been caused to travel, being taken as a 
reference during the determination of said extraction area. 

75 

50. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 
size, from said image, 

20 b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
25 image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
30 ground, as said target object image, 

1) detecting a vector directed towards said target object image as a first travel vector. 

j) detecting a contour line of said target object image, which line extends in a predetermined 
direction, from said cut-out first image, 

k) extracting all of components of said detected contour liner which are tilted at a predetermined 
35 angle with respect to contours of a group of concentric circles surrounding the center point of said 

view window, from said detected contour line of said target object image, 

I) detecting azimuths and intensities of said extracted components with respect to the center point of 
said view window, the azimuths and the intensities being detected as second azimuth vectors, 
m) composing a vector from said second azimuth vectors, a second travel vector being thereby 
40 determined, 

n) extracting a region, which approximately coincides in color with said target object image, from 
said cut-out first image, 

o) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

45 p) detecting said azimuth and said distance as a third travel vector, 

q) composing a vector from said first, second, and third travel vectors, a vector for a travel of said 
view window being thereby determined, 

r) causing the center point of said view window to travel in accordance with said vector for the travel 
of said view window, and 

50 S ) determining an extraction area in accordance with the size and/or the shape of said target object 

image, the center point of said view window, which has thus been caused to travel, being taken as a 
reference during the determination of said extraction area. 

51. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
55 object image is carried out by: 

a) creating a map of a potential field of the whole area of said image, and 

b) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
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determination of said extraction area. 

52. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 

from said image, 

b) detecting a contour line of said target object image, which line extends in a predetermined 
direction, from said cut-out image, 

c) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

d) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as azimuth vectors, 

e) composing a vector from said azimuth vectors, the composed vector being taken as a gradient 
vector of a potential field in a Cartesian plane having its origin at the center point of said view 
window, 

f) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

g) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 

h) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

53. A learning method for a neural network as defined in Claim 52 wherein the detection of said contour 
line of said target object image, which line extends in the predetermined direction, is carried out after 
said cut-out image has been transformed with complex-log mapping into a complex-log mapped image 
by taking the center point of said view window as a pole of mapping, 

the extraction of said components of said detected contour line is carried out by extracting all of 
contour line components, which are tilted at a predetermined angle with respect to an annular direction 
in the complex-log mapped plane, from the contour line, which has been detected in said complex-log 
mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 
line components in said complex-log mapped plane. 

54. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 

from said image, 

b) extracting a region, which approximately coincides in color with said target object image, from 
said cut-out image, 

c) detecting an azimuth and a distance of said extracted region with respect to the center point of 
said view window, 

d) detecting said azimuth and said distance as a gradient vector of a potential field in a Cartesian 
plane having its origin at the center point of said view window, 

e) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

f) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 

g) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

55. A learning method for a neural network as defined in Claim 54 wherein at least a single step selected 
from the group consisting of the extraction of said region, the detection said azimuth and said distance, 
and the detection of said vector for the travel of said view window is carried out on a complex-log 
mapped plane in which the center point of said view window is taken as a pole of mapping. 
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56. A learning method for a neural network as defined in Claim 54 wherein, of a plurality of regions 
approximately coinciding in color with said target object image, regions, which exhibit a h.gh degree of 
coincidence in color with said target object image and are located at positions close to one another, are 
caused to cooperate with each other and are thereby emphasized, 

5 a region, which exhibits a high degree of coincidence in color with said target object image and a 

region which exhibits a low degree of coincidence in color with said target object image and is located 
at a position spaced apart from said region exhibiting a high degree of coincidence in color with sa.d 
target object image, are caused to compete with each other, said region, which exhibits a low degree of 
coincidence in color with said target object image, being thereby erased, 

,o regions which exhibit a high degree of coincidence in color with said target object image and are 

located at positions spaced apart from each other, are caused to compete with each other a region 
exhibiting a high degree of coincidence in color with said target object image, wh.ch region has a s.ze 
and a shape appropriate for the region to be selected, is kept unerased, whereas a region exhibiting a 
high degree of coincidence in color with said target object image, which region has a s.ze and a shape 

»5 inappropriate for the region to be selected, is erased, ea , 0 „ to H 

whereby a region, which is most appropriate in the region inside of said view window, is selected 

as a target object image region, and 

an azimuth and a distance of said selected object image region are detected w.th respect to the 

center point of said view window. 

57. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 

object image is carried out by: . 

a) setting a view window, which has a predetermined size, on said image, said .mage be.ng an 

image including a movement, „i,,, = i it „ 
cutting out a plurality of images, which fall in a region inside of sa.d v.ew window, at a plurality 
of times having a predetermined time difference therebetween, 

detecting contour lines of object images, which are embedded in the plurality of said cut-out 

images, r . 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference. 

b) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of sa.d v.ew 

"'^calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and . 

detecting a movement of said image in an in-plane rotating direct.on in the region .ns.de of said 
view window, the movement being detected from said calculated difference, 

c) detecting contour lines of said object images, which are embedded in the plurality of sa.d cut-out 
imaqes, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour l.nes extend- 
ing in the annular direction, and , . 

detecting a movement of said image in the radial direction in the reg.on ms.de of sa.d v.ew 
window, the movement being detected from said calculated difference.- u . u , „ 

d) compensating for components of a movement of a background in said cut-out .mages, wh.ch fall 
in the region inside of said view window, in accordance with said detected movement of sa.d image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the rad.al direct.on, a 
plurality of images, in which the components of the movement of the background have been 
compensated for, being thereby obtained. 

e) calculating the difference between the plurality of said images, in wh.ch the components of the 
movement of the background have been compensated for, a contour line of an object, wh.ch shows 
a movement different from the movement of the background, being thereby detected, 

f) extracting all of components of said detected contour line of said object showing a movement 
different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said v.ew 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 
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g) detecting azimuths and intensities of said extracted components of said detected contour line of 
said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
azimuth vectors, 

5 h) composing a vector from said azimuth vectors, the composed vector being taken as a gradient 

vector of a potential field in a Cartesian plane having its origin at the center point of said view 
window, 

i) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 
io j) creating a map of the potential field of the whole area of said image from the gradient vectors of 

the potential field, which have been calculated with respect to the whole area of said image, and 
k) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

15 

58. A learning method for a neural network as defined in Claim 57 wherein the detection of said contour 
lines of said object images, which are embedded in the plurality of said cut-out images, said contour 
lines extending in the radial direction with respect to the center point of said view window, and/or the 
detection of said contour lines of said object images, which are embedded in the plurality of said cut- 

20 out images, said contour lines extending in the annular direction, are carried out after said cut-out 

images are transformed with complex-log mapping into complex-log mapped images by taking the 
center point of said view window as a pole of mapping. 

59. A learning method for a neural network as defined in Claim 57 wherein the detection of said contour 
25 line of said object, which shows the movement different from the movement of the background, is 

carried out after an image representing said difference between the plurality of said images, in which 
the components of the movement of the background have been compensated for, has been trans- 
formed with complex-log mapping into a complex-log mapped image by taking the center point of said 
view window as a pole of mapping, 

30 the extraction of said components of said detected contour line of said object is carried out by 

extracting all of contour line components, which are tilted at a predetermined angle with respect to an 
annular direction in the complex-log mapped plane, from the contour line, which has been detected in 
said complex-log mapped image, and 

said azimuth vectors are detected by detecting azimuths and intensities of the extracted contour 

35 line components in said complex-log mapped plane. 

60. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 
size, from said image, 

b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, which shows 
a movement different from the movement of said background, being thereby detected, 

h) recognizing said object, which shows a movement different from the movement of said back- 
ground, as said target object image, 

i) detecting a vector directed towards said target object image as a gradient vector of a potential 
field in a Cartesian plane having its origin at the center point of said view window, 
j) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

k) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
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I) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

5 61. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) cutting out an image, which falls in a region inside of a view window having a predetermined size, 
from said image, 

b) detecting a contour line of said target object image, which line extends in a predetermined 
;o direction, from said cut-out image. 

c) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

d) detecting azimuths and intensities of said extracted components with respect to the center point 
76 of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

e) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

f) extracting a region, which approximately coincides in color with said target object image, from said 
cut-out image, 

g) detecting an azimuth and a distance of said extracted region with respect to the center point of 
20 said view window, 

h) detecting said azimuth and said distance as a second travel vector, 

i) composing a vector from said first and second travel vectors, the composed vector being taken as 
a gradient vector of a potential field in a Cartesian plane having its origin at the center point of said 
view window, 

j) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

k) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
I) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

62. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 
3 5 a) setting a view window, which has a predetermined size, on said image, said image being an 

image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said target object image, which line extends in a predetermined 
40 direction, from one of the plurality of said cut-out images, 

b) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
45 of said view window, the azimuths and the intensities being detected as first azimuth vectors, 

d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
window, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 
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g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 
5 detecting a movement of said image in the radial direction in the region inside of said view 

window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fail 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial direction, a 

;o plurality of images, in which the components of the movement of the background have been 

compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, which shows 
a movement different from the movement of the background, being thereby detected, 

, 5 j) extracting all of components of said detected contour line of said object showing a movement 

different from the movement of the background, which are tilted at a predetermined angle with 
respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

20 k) detecting azimuths and intensities of said extracted components of said detected contour line of 

said object, which shows the movement different from the movement of the background, with 
respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 
25 determined, 

m) composing a vector from said first and second travel vectors, the composed vector being taken 
as a gradient vector of a potential field in a Cartesian plane having its origin at the center point of 
said view window, 

n) scanning the whole area of said image with said view window, thereby calculating the gradient 

30 vectors of the potential field with respect to the whole area of said image, 

o) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
p) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference during the 

35 determination of said extraction area. 

63. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: 

a) setting a view window, which has a predetermined size, on said image, said image being an 
40 image including a movement, 

cutting out a plurality of images, which fall in a region inside of said view window, at a plurality 
of times having a predetermined time difference therebetween, and 

detecting a contour line of said target object image, which line extends in a predetermined 
direction, from one of the plurality of said cut-out images, 
45 b) extracting all of components of said detected contour line, which are tilted at a predetermined 

angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

c) detecting azimuths and intensities of said extracted components with respect to the center point 
of said view window, the azimuths and the intensities being detected as first azimuth vectors, 
so d) composing a vector from said first azimuth vectors, a first travel vector being thereby determined, 

e) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, 

calculating the difference between images, which represent said detected contour lines, and 
detecting a movement of said image in an in-plane parallel direction in the region inside of said 
55 view window, the movement being detected from said calculated difference, 

f) detecting contour lines of object images, which are embedded in the plurality of said cut-out 
images, said contour lines extending in a radial direction with respect to the center point of said view 
window, 
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calculating the difference between images, which represent said detected contour lines extend- 
ing in the radial direction, and . 

detecting a movement of said image in an in-plane rotating direction in the region inside of said 
view window, the movement being detected from said calculated difference, 
5 g) detecting contour lines of said object images, which are embedded in the plurality of said cut-out 

images, said contour lines extending in an annular direction, 

calculating the difference between images, which represent said detected contour lines extend- 
ing in the annular direction, and 

detecting a movement of said image in the radial direction in the region inside of said v.ew 
w window, the movement being detected from said calculated difference, 

h) compensating for components of a movement of a background in said cut-out images, which fall 
in the region inside of said view window, in accordance with said detected movement of said image 
in the in-plane parallel direction, in the in-plane rotating direction, and/or in the radial d.rection, a 
plurality of images, in which the components of the movement of the background have been 

15 compensated for, being thereby obtained, 

i) calculating the difference between the plurality of said images, in which the components of the 
movement of the background have been compensated for, a contour line of an object, wh.ch shows 
a movement different from the movement of the background, being thereby detected. 

j) extracting all of components of said detected contour line of said object showing a movement 
20 different from the movement of the background, which are tilted at a predetermined angle with 

respect to contours of a group of concentric circles surrounding the center point of said view 
window, from said detected contour line of said object showing a movement different from the 
movement of the background, 

k) detecting azimuths and intensities of said extracted components of said detected contour line of 
25 said object which shows the movement different from the movement of the background, with 

respect to the center point of said view window, the azimuths and the intensities being detected as 
second azimuth vectors, 

I) composing a vector from said second azimuth vectors, a second travel vector being thereby 

determined, , 
3 o m) extracting a region, which approximately coincides in color with said target object image, from 

one of the plurality of said cut-out images. 

n) detecting an azimuth and a distance of said extracted region with respect to the center po.nt of 

said view window, 

o) detecting said azimuth and said distance as a third travel vector, 
35 p) composing a vector from said first, second, and third travel vectors, the composed vector being 

taken as a gradient vector of a potential field in a Cartesian plane having its origin at the center point 

of said view window, , ^ 

q) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 
40 r) creating a map of the potential field of the whole area of said image from the gradient vectors of 

the potential field, which have been calculated with respect to the whole area of said .mage, and 
s) determining an extraction area in accordance with the size and/or the shape of said target object 
image, a minimum point of the potential in said map being taken as a reference dunng the 
determination of said extraction area. 

64. A learning method for a neural network as defined in Claim 36 wherein the extraction of said target 
object image is carried out by: t _ 

a) cutting out a first image, which falls in a region inside of a view window having a predetermined 

size, from said image, 

so b) detecting a contour line of an object, which is embedded in said cut-out first image, 

c) after a predetermined time has elapsed, cutting out a second image, which falls in the region 
inside of said view window, from said image, 

d) detecting a contour line of an object, which is embedded in said cut-out second image, 

e) calculating the difference between said contour line, which has been detected from said first 
55 image, and said contour line, which has been detected from said second image, 

f) detecting a movement of a background from said calculated difference, 

g) subtracting said detected movement of said background from said image, an object, wh.ch shows 
a movement different from the movement of said background, being thereby detected, 
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h) recognizing said object, which shows a movement different from the movement of said back- 
ground, as said target object image, 

i) detecting a vector directed towards said target object image as a first travel vector, 

j) detecting a contour line of said target object image, which line extends in a predetermined 
5 direction, from said cut-out first image, 

k) extracting all of components of said detected contour line, which are tilted at a predetermined 
angle with respect to contours of a group of concentric circles surrounding the center point of said 
view window, from said detected contour line of said target object image, 

I) detecting azimuths and intensities of said extracted components with respect to the center point of 
10 said view window, the azimuths and the intensities being detected as second azimuth vectors. 

m) composing a vector from said second azimuth vectors, a second travel vector being thereby 
determined, 

n) extracting a region, which approximately coincides in color with said target object image, from 
said cut-out first image, 

75 o) detecting an azimuth and a distance of said extracted region with respect to the center point of 

said view window, 

p) detecting said azimuth and said distance as a third travel vector, 

q) composing a vector from said first, second, and third travel vectors, the composed vector being 
taken as a gradient vector of a potential field in a Cartesian plane having its origin at the center point 
20 of said view window, 

r) scanning the whole area of said image with said view window, thereby calculating the gradient 
vectors of the potential field with respect to the whole area of said image, 

s) creating a map of the potential field of the whole area of said image from the gradient vectors of 
the potential field, which have been calculated with respect to the whole area of said image, and 
25 t) determining an extraction area in accordance with the size and/or the shape of said target object 

image, a minimum point of the potential in said map being taken as a reference during the 
determination of said extraction area. 

65. A method for recognizing an object image as defined in Claim 1 wherein the extraction of said 
30 candidate for the predetermined object image is carried out after the center point of a view window 

having a predetermined size has been caused to travel to the center point of said candidate for the 
predetermined object image, and the size and/or the shape of said candidate for the predetermined 
object image has been normalized by taking the center point of said view window and a contour line of 
said candidate for the predetermined object image as reference. 
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© A method for recognizing an object image com- 
prises the steps of extracting a candidate for a 
predetermined object image from an image, and 
making a judgment as to whether the extracted 
candidate for the predetermined object image is or 
is not the predetermined object image. The can- 
didate for the predetermined object image is ex- 
tracted by causing the center point of a view win- 
dow, which has a predetermined size, to travel to the 
position of the candidate for the predetermined ob- 
ject image, and determining an extraction area in 
accordance with the size and/or the shape of the 
candidate for the predetermined object image, the 
center point of the view window being taken as a 
re f erence during the determination of the extraction 
area. A learning method for a neural network com- 
prises the steps of extracting a target object image, 
for which learning operations are to be carried out. 
from an image, feeding a signal, which represents 
the extracted target object image, into a neural net- 
work, and carrying out the learning operations of the 
neural network in accordance with the input target 



object image. 
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